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Preface 



This volume of the Lecture Notes in Computer Science series contains the set of 
papers accepted for presentation at the 5th International Workshop on Quality 
of future Internet Services (QofIS 2004) and at the two one-day workshops co- 
located with QofIS 2004, namely the 1st International Workshop on QoS Routing 
(WQoSR 2004) and the 4th International Workshop on Internet Charging and 
QoS Technology (ICQT 2004). 

QofIS 2004, the fifth international event, was organized under the umbrella 
of the E-NEXT Network of Excellence on “Emerging Networking Experiments 
and Technologies”, which started its activities in January 2004. QofIS 2004 took 
place on September 29-30, 2004 at the Telefonica premises in Barcelona, and 
was arranged by the Universitat Politecnica de Catalunya (UPC) . QofIS 2004 in 
Barcelona followed the highly successful workshops in Stockholm in 2003, Zurich 
in 2002, Coimbra in 2001, and Berlin in 2000. The purpose of QofIS 2004, as of all 
QofIS events, was to present and discuss design and implementation techniques 
for providing quality of service in the Internet. 

The impact of emerging terminals, mobility and embedded systems is creating 
a new environment where networks are ambient. New challenges are opened 
by this new space where networks of interest ranging from personal networks 
to large-scale application networks need to be designed and often integrated. 
Protocol mechanisms for supporting quality of service at the different layers of 
the networks need to be assessed and eventually redesigned in such environments. 
In this context, the focus of the QofIS 2004 workshop was on the provisioning 
of Quality of Service in the Emerging Networking Panorama, assessed by results 
of experiments carried out in simulation platforms and test-beds, and given the 
progressive irruption of optical technologies. 

QofIS 2004 contributed to this LNCS volume with 22 research papers se- 
lected from the 91 submissions received by the workshop, which address specific 
problems of quality-of-service provisioning in the fields of Internet applications, 
such as P2P and VoIP; service differentiation and congestion control; traffic en- 
gineering and routing; wireless LAN, ad hoc and sensor networks; and mobility 
in general. According to this, the workshop was organized in five sessions and 
featured two invited talks, by Prof. Ian F. Akyildiz of the Georgia Institute of 
Technology (USA) and Prof. David Hutchison of the University of Lancaster 
(UK), and was closed with a panel session, “New Face for QoS”, featuring QofIS 
2004 invited speakers and leaders of FP6 Networks of Excellence in networking 
and QoS provisioning. 
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Preface 



The 1st Workshop on Quality of Service Routing was motivated by the growing 
number of contributions on this topic within the papers submitted to previous 
QofIS editions. We thought it would be worth having a set of sessions focused 
on QoS routing aspects and we decided to organize it as a co-located workshop 
within QofIS. 

Quality of service routing poses several challenges that must be addressed 
to enable the support of advanced services in the Internet, both at intra- and 
inter-domain levels. The challenges of QoS routing are related to the distribution 
of routing information and to path selection and setup throughout the network. 
Extensive research has been carried out on QoS routing in the past few years. 
New frontiers are opening up to QoS routing such as the introduction of QoS 
routing in ad hoc, wireless multihop, sensor and self-organized networks, in con- 
tent delivery networks and in optical technologies. 

The purpose of this workshop was to summarize current research in QoS 
routing, describing experimental and theoretical studies, and to point out new 
research directions leading to Smart Routing. We were glad to see that 28 papers 
were submitted to the workshop, with authors from 15 different countries and 
covering a wide range of topics focused on QoS routing. After the reviewing 
process, 8 papers were selected, those that are included in this LNCS volume. The 
final WQoSR 2004 program was structured in two technical sessions, respectively 
devoted to Algorithms and Scalability issues and to Novel Ideas and Protocol 
Enhancements, and an invited talk given by Prof. Ariel Orda from the Technion 
at Haifa (Israel). 

We hope the reading of these selected papers might be appealing and stim- 
ulating for the research community and that this workshop will be continued in 
the future. 
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The International Workshop on Internet Charging and QoS Technology 
(ICQT 2004) was the fourth event in a series of very successful annual work- 
shops on network economics and Internet charging mechanisms. After establish- 
ing ICQT in 2001 in Vienna, Austria, the workshop was co-located once before 
with QofIS in 2002 in Zurich, Switzerland. The 2003 workshop took place in Mu- 
nich together with NGC 2003. In 2004, ICQT was again co-located with QofIS 
and provided further vivid proof of the stimulating interdisciplinary combina- 
tion of economics and networking technology, which has made these workshops 
a success story. 

As in previous years, ICQT 2004 received more than 20 submissions from 14 
different countries. Our enthusiastic Technical Program Committee managed to 
provide between 3 and 5 reviews per paper. Eventually, 8 papers were selected for 
the final program and arranged to form sessions on Auctions and Game Theory, 
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Charging in Mobile Networks, and QoS Provisioning and Monitoring. Together 
with the traditional invited lecture, this program presented a broad view on 
current research work in the interesting area where economy meets technology, 
where theory meets application, and where “QoS has its price”, as is stated in 
the title of ICQT 2004. 
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Performance Analysis of 
Peer-to-Peer Networks for File Distribution 



Ernst W. Biersack 1 , Pablo Rodriguez 2 , and Pascal Felber 1 



1 Institut EURECOM, France 
{erbi , f elber}@eurecom. fr 
2 Microsoft Research, UK 
pabloOmicrosoft . com 



Abstract. Peer-to-peer networks have been commonly used for tasks 
such as file sharing or file distribution. We study a class of cooperative file 
distribution systems where a file is broken up into many chunks that can 
be downloaded independently. The different peers cooperate by mutually 
exchanging the different chunks of the file, each peer being client and 
server at the same time. While such systems are already in widespread 
use, little is known about their performance and scaling behavior. We 
develop analytic models that provide insights into how long it takes to 
deliver a file to N clients. Our results indicate that the service capacity 
of these systems grows exponentially with the number of chunks a file 
consists of. 



1 Introduction 

Peer-to-peer systems, in which peer computers form a cooperative network and 
share their resources (storage, CPU, bandwidth), have attracted a lot of interest 
lately. They provide a great potential for building cooperative networks that are 
self-organizing, efficient, and scalable. 

Research in peer-to-peer networks has so far mainly focused on content stor- 
age and lookup, but fewer efforts have been spent on content distribution. By 
capitalizing on the bandwidth of peer nodes, cooperative architectures offer great 
potential for addressing some of the most challenging issues of today’s Internet: 
the cost-effective distribution of bandwidth-intensive content to thousands of 
simultaneous users both Internet-wide and in private networks. 

Cooperative content distribution networks are inherently self-scalable, in that 
the bandwidth capacity of the system increases as more peers arrive: each new 
peer requests service from, but also provides service to, the other peers. The 
network can thus spontaneously adapt to the demand by taking advantage of 
the resources provided by every peer. 

We present a deterministic analysis that provides insights into how differ- 
ent approaches for distributing a file to a large number of clients compare. We 
consider the simple case of N peers that arrive simultaneously and request to 
download the same file. Initially, the file exists in a single copy stored at a node 
called source or server. We assume that the file is broken up into chunks and 
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that peers cooperate, i.e., a peer that has completely received a chunk will offer 
to upload this chunk to other peers. The time it takes to download the file to all 
peers will depend on how the chunks are exchanged among the peers, which is 
referred to as peer organization strategy. 

To get some insights into the performance of different peer organization 
strategies, we analytically study three different distribution models: 

A linear chain architecture, referred to as Linear , where the peers are orga- 
nized in a chain with the server uploading the chunks to peer Pi, which in 
turn uploads the chunks to P 2 and so on. 

A tree architecture, referred to as Tree k , where the peers are organized in a 
tree with an outdegree k. All the peers that are not leaves in the tree will 
upload the chunks to k peers. 

A forest of trees consisting of k different trees, referred to as PTree k , which 
partitions the file into k parts and constructs k spanning trees to distribute 
the k parts to all peers. 

We analyze the performance of these three architectures and derive an upper 
bound on the number of peers served within an interval of time t. We consider a 
scenario where each peer has equal upload and download rates of b. The upload 
rate of the server is also b. We focus on the distribution of a single file that is 
partitioned into C chunks. The time needed to download the complete file at rate 
b is referred to as one round or 1 unit of time. Thus, the time needed to download 
a single chunk at rate b is 1/C. For the sake of simplicity, we completely ignore 
the bandwidth fluctuation in the network or node failures. We assume that the 
only constraint is the upload/download capacity of peers. 

Several systems have been recently proposed to leverage peer-to-peer archi- 
tectures for application-layer multicast. Most of these systems target streaming 
media (e.g., [1,2, 3, 4]) but some also consider bulk file transfers (e.g., [5,6]). Ex- 
perimental evaluation and measurements have been conducted on real-world 
several peer-to-peer systems to observe their properties and behavior [7,8,9,10] 
but, to the best of our knowledge, there has been scarcely any analytical study 
of distribution architectures for file distribution. We are only aware of one other 
paper that evaluates the performance and scalability of peer-to-peer systems by 
modeling the propagation of the file as a branching process [11]. However, no 
particular distribution architecture is assumed. The results of this paper indicate 
that the number of clients that complete the download grows exponentially in 
time and are in accordance with our results. 

The rest of the paper is organized as follows. Section 2 introduces the Linear 
architecture. In Section 3 we study Tree k and we evaluate PTree k in Section 4. 
We then presents a comparative analysis of the three distribution models in 
Section 5 and conclude the paper in Section 6. 

2 Linear: A Linear Chain Architecture 

In this section, we study the evolution over time of the number of served peers 
for the Linear architecture. We make the following assumptions: 
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The server serves sequentially and infinitely the file at rate b. At any point 
in time, the server uploads the file to a single peer. 

- Each peer starts serving the file once it receives the first chunk. 



We consider the case where each peer uploads the whole file at rate b to 
exactly one other peer before it disconnects. Thus, each peer contributes the 
same amount of data to the system as it receives from the system. At time 0, 
the server starts serving a first peer. At time 1/C, the first peer has completely 
received the first chunk and starts serving a second peer. Likewise, once the 
second peer has received the first chunk at time 2/C, it starts serving a third 
peer and so on. As a result, peers are connected in a chain with each peer 
receiving chunks from the previous one and serving the next one. The length 
(i.e., the number of peers) of the chain increases by one peer each 1/C unit of 
time. At time 1 , the server finishes uploading the file to the first peer. If there 
are still peers left that have not even received a single chunk, the server starts 
a new chain that increases also by one peer each 1/C' unit of time. The same 
process repeats at each round, as shown in Figure 1 (the black circle represents 
the server, the black squares are peers that start downloading the file, and the 
lines connecting the peers correspond to active connections). This makes (£ + 1) 
chains within t rounds. The number of served peers at time t over all those chains 
includes only the peers that have joined the network on or before time t — 1. 
Clients that arrive after time t — 1 will take one unit of time to download the 
file and will be done after time t. Given a chain initiated at time 0, its length at 
time £ is (1 + £ • C) and the number of served peers in that chain is 1 + (£ — 1)C 
peers. Over all chains, the number of served peers within £ rounds is given by 

N Linear (C, £) = £(1 + (i- 1 )C) = t + ~c-t 2 ( 1 ) 

i—1 



We see that the number of peers served grows linearly with the number of 
chunks C and quadratically with the number of rounds £. From Equation (1) we 
derive the formula for the time needed to completely serve N peers as 



Tunear{C, N) = 



(C - 2) + a/(C — 2) 2 + 8 • IV • C 1 



2 C 



2 • N 
C 



(2) 



If N/C denotes the node to chunk ratio, we can distinguish the following 
cases: 



1. T Linear (C, N) « i + yfl = 1, for % « 1 

2. T Linear (C, N) « 1 + s/2 « 2, for § = 1 

3. T Linear (C,N) « for § > 1 

Figure 2 plots N Linear (C , £) as a function of the number of rounds for different 
values of the number of chunks C . It appears clearly that, for a given number of 
peers N, the smaller the node to chunk ratio N/C, the shorter the time to serve 
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Fig. l. Evolution of the Linear ar- 
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Fig. 2. T L inear(C, N) as a function 
of N and C. 



these N peers. In fact, for N/C <C 1 all peers will be active uploading chunks 
for most of the time and Tn near will be approximately one round. On the other 
hand, for N/C > 1 only C out of the N peers will be uploading at any point in 
time, while the other N — C peers have either already forwarded the entire file 
or not yet received a single chunk. 



3 Tree k : A Tree Distribution Architecture 

As we have just seen, for N/C > 1 the linear chain fails to keep all the peers 
working most of the time. To alleviate this problem we now consider Tree k , a 
tree architecture with outdegree k where the number of “hops” from the server 
to the last peer is log fc N, as compared to N for the linear chain. We make the 
following assumptions: 

- The server serves k peers in parallel, each at rate b/k. 

- Each peer downloads the whole file at rate b/k. 

A peer that is interior (i.e., non leaf) node of the distribution tree starts 
uploading the file to k other peers, each rate b/k, soon as it has received 
the first chunk. This means that interior nodes upload an amount of data 
equivalent to k times the size of the file, while leaf nodes do not upload the 
file at all. 

Given a download rate of b/k, a peer needs k/C units of time to receive a 
single chunk. Note that 7ree fe=1 is equivalent to Linear. We first explain the 
evolution for k = 2. At time 0, the server serves 2 peers each at rate 6/2. Each 
of those peers starts serving 2 new peers 2/C units later, which will need to wait 
another 2/C units before they have completely received a chunk and can in turn 
serve other peers. The two peers served by the server become each a root of a 
tree with an outdegree 2. The height of each tree increases by one level every 
2/C units of time (see Figure 3). 
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architecture Fig. 4. TTree(C,k,N) as a func- 
tion of k, N, and C. 



In the Tree fc architecture, k identical trees are initiated by the server at time 
0, each of which will include N/k peers. The time needed to serve N peers is 

N k 

T Tree (C, k, N) = k + Llog fc ( — )J • - (3) 

where k/C represents the delay induced by each level in the tree. Leaf peers 
start receiving the first chunk [log fc (fV/fc)J - k/C units of time after the root peer 
and complete the download k units of time later. 

We derive from Equation (3) the number of peers served within t rounds as 

N Tr ee(C,k,t) RS k^~ 1)C+1 = k (t ~ k) % +l (4) 

It follows from Equation (3) and (4) that the performance of file distribution 
directly depends on the degree k of the tree. We can compute the optimal value 
of k by taking the derivative of T tree with respect to k. This gives 

— log N-\-y/ (log JV) 2 +4- (C — 1) - log N — 1 

k op t = e 2 < c - 1 ) given that N < k ■ — - — (5) 

K X 

The optimal outdegree k op t depends on the peer to chunk ratio N/C. For 
N/C < 1, the optimal outdegree is 1, i.e., a linear chain, since the linear chain 
assures that the peers are uploading most of the time at their full bandwidth 
capacity. For N/C > 1, an increase in N/C leads to an increase in the optimal 
outdegree as the linear chain becomes less and less effective (remember that only 
C out of the N peers are uploading simultaneously). 

In practice, the outdegree can only take integer values and we see from Fig- 
ure 4 that for N/C > 1 the binary tree yields lower download times that the 
linear chain. The binary tree is also the optimal tree. Remember that in T tree 
the outdegree k appears as an additive constant that is typically much larger 
than the other term (|_log fc (.ZV/fc)J - k/C) 

While the binary tree improves the download time as compared to the linear 
chain when N/C > 1, it suffers from two important shortcomings. First, although 
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the maximum upload and download rate is b, the peers in a binary tree download 
only at rate 6/2. As a consequence, the download time is at least twice the time 
it takes if the file were downloaded at the maximum possible download rate. 

Second, in a binary tree of height h, there are 2 h leaf nodes and 2 h — 1 interior 
nodes. Since only the interior nodes upload chunks to other peers, this means 
that half of the peers will not upload even a single block. As the outdegree k 
of the tree increases, the percentage of peers uploading data keeps decreasing, 
with only about one out of k peers uploading data. Also, the peers that upload 
must upload the entire file k times. 

4 PTree k : An Architecture Based on Parallel Trees 

The overall performance of the tree architecture would be significantly improved 
if we could capitalize on the unused upload capacity of the leaves to utilize the 
b—b/k unused download capacity at each of the peers. It is not possible, however, 
for a leaf to serve other peers upwards its tree because it only holds chunks that 
its ancestors already have. Given a tree architecture with k trees rooted at the 
server, the basic intuition underlying the PTree k architecture is to “connect” the 
leaves of one of the trees to peers of the other k — 1 trees to ultimately produce 
k spanning trees, and have the server send distinct chunks to each of these trees. 

More specifically, the PTree k architecture organizes the peers in k different 
trees such that each peer is an interior peer in at most one tree and a leaf peer 
in the remaining k — 1 trees. The file is then partitioned into k parts, where 
each part is distributed on a different tree: tree T k for part P k . All k parts 
have the same size in terms of number of bytes. If the entire file is divided 
into C chunks, each of the k parts will comprise C/k disjoint chunks. 1 Such a 
distribution architecture was first proposed under the name of SplitStream [3] to 
increase the resilience against churn (i.e., peers failing or leaving prematurely) 
in a video streaming application. 

In PTree k , a peer receives the k parts in parallel from k different peers, each 
part at rate b/k, while the peer helps distributing at most one part of the file to 
k other peers. Therefore, the total amount of data a peer uploads corresponds 
exactly to the amount contained in the file, regardless the outdegree k of the 
trees. 

Figure 5 depicts the basic idea of P7ree fe=2 , where k denotes the outdegree 
of the tree. Each peer, except for peer 4, is an interior peer in one tree and a 
leaf peer in another tree. It is easy to show that, independent of the outdegree 
k, there will always be one peer in PTree k that is leaf in all k trees. Each tree 
includes all N peers. A tree with outdegree k has 1+ [logkN J levels and a height 
of [logkN J. Since peers transmit at a rate b/k, each level in the tree induces a 
delay of k/C units of time. 

Consider a leaf peer Co in tree T k that is located [logkN J levels down from 
the root of T k . C 0 starts receiving part P k at time [logkN J • k/C and the time 

1 For the sake of simplicity, we assume that the number of chunks C is a multiple of 
the number of parts k. 
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Fig. 5. Evolution of the PTree k 2 architecture Fig. 6. TpTree(C,k, N) as a func- 
with time. tion of k , N, and C. 



to receive P k entirely, once reception has started, is 1. Therefore, Co will have 
completely received part P k at time 1 + [logkN\ ■ k/C. 

A peer has completed its download after it has received the last byte of each 
of the k parts of the file. A PTree k peer is a leaf node in k — 1 trees and an interior 
node in one tree, and it receives all k parts in parallel. This means that all peers 
complete their download at the same time 1+ [log k N\ - k/C. We therefore have 

Tp T ree(C, k, N) = 1 + [lo 9k N\ ■ £ (6) 

We derive from Equation (6) the number of peers served within t rounds as 
N PTree (C,k,t) (7) 

Similarly to the tree distribution architecture in Section 3, there is an optimal 
value k for PTree k that minimizes the service time. Intuitively, a very deep tree 
should be quite inefficient in engaging peers early since leaves are quite far from 
the source. In fact, PTree k=1 is equivalent to Linear, which is very inefficient in 
engaging peers for N/C > 1. On the other hand, when the outdegree of the tree 
is large, leaf peers are only a few hops from the source and can be engaged fast. 
However, this intuition is not completely correct: flat trees with large outdegrees 
suffer from the problem that, as the outdegree k increases, the rate b/k at which 
each chunk is transmitted from one level to the next one decreases linearly with 
k. This rate reduction can negate the benefits of having many peers reachable 
within few hops. 

We can compute the optimal tree outdegree that provides the best PTree k 
performance by taking the derivative of Equation (6) with respect to k and 
equating the result to zero. We find Tppree to be minimal for k = e, independant 
of the peer to chunk ratio N/ C. 

Figure 6 depicts the performance of PTree k as a function of the outdegree. We 
see that the optimal PTree k performance is obtained for trees with an outdegree 
k = 3. However, the performance for k = 2 and k = 4 is almost the same as 
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for k = 3. As the outdegree increases the performance of PTree degrades: for 
N/C ~ 1 the degradation is very small while for N/C 1 it is quite pronounced. 

By striping content across multiple trees, PTree k can ensure that the de- 
parture of one peer causes only a minimal disruption to the system, reducing 
the peer’s throughput only by b/k. Given that the overhead caused by churn 
can be minimized by striping content across a higher number of trees, one can 
consider slightly higher outdegrees than the optimal value (e.g., 5) to minimize 
the impact of churn at the expense of a minimal increase in transfer time. 

5 Comparative Analysis 

In this section, we compare the performance of the Linear , Tree k and PTree k 
architecture. We first investigate how the time needed to serve N peers varies 
as a function of the number of peers N and the number of chunks C. 



C=10 2 




(a) C = 10 2 



C=10 6 




(b) C = 10 6 



Fig. 7. Performance of Linear , Tree k i 2,3 i and PTree k f 2,3 i as a function of N. 



From Figure 7, we see that independent of the number of nodes and chunks, 
PTree k is able to offer download times close to 1. On the other hand, as already 
pointed out, the download times for Tree k are always larger than k units of time 
(see Equation (3)). PTree k with optimal outdegree k = 3 provides clear benefits 
over Linear for N/C 1 since peers are engaged much faster into uploading 
chunks than with a linear chain. 

When the propagation delay of the first chunk is very small compared to 
the transmission time of the file, the peers stay engaged most of the time in 
the linear chain and the benefit of PTree k diminishes. This is the case when 
the number of chunks is very large (C — > oo), the number of peers is small, 
or the transmission rate is very high. The pivotal point where PTree k starts to 
significantly outperform Linear is around N/C > ICG 1 (see Figure 7(b)). 

The comparison of the different approaches would be incomplete if we did not 
address aspects such as robustness and ease of deployment. Cooperative file dis- 
tribution relies on the collaboration of individual computers that are not subject 
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to any centralized control or administration. Therefore, the failure or departure 
of some computers during the data exchange are most likely to occur and should 
also be taken into account when comparing approaches. For the linear chain and 
the tree, the departure of the node will disconnect all the nodes “downstream” 
from the data forwarding. With PTree k , the impact of the departure of a node 
effects only affects one out of k trees, which makes parallel trees the most robust 
of the three approaches. 

The Linear and PTree k architectures both assume that the upload and down- 
load rates are the same. In practice, this is often not the case (e.g., the upload 
rate of ADSL lines is only a fraction of the download rate). In such cases, the 
performance will be limited by the upload rate and some of the download band- 
width capacity will remain unused. The Tree k architecture assumes that nodes 
can upload data at a rate k times higher than the download rate, which is exactly 
the opposite to what ADSL offers. The tree approach is therefore particularly 
ill-suited for such environments. 

From these results we can conclude that, for most typical file-transfer scenar- 
ios, PTree k provides significant benefits over the other two architectures studied 
in this paper. 



6 Conclusions and Perspectives 

The self-scaling and self- organizing properties of peer-to-peer networks offer the 
technical capabilities to quickly and efficiently distribute large or critical content 
to huge populations of clients. Cooperative distribution techniques capitalize on 
the bandwidth of every peer to offer a service capacity that grows exponentially, 
provided the blocks among the peers are exchanged in such a way that the 
peers are busy most of the time. The architecture that best achieves this goal 
among those studied in the paper, independently of the peer to chunk ratio 
N/C , is PTree k . For both Tree k and PTree k there is an optimal outdegree that 
minimizes the download time. 

Our analysis provided some important insights as to how to choose certain 
key parameters such as C and k. First, the file should be partitioned into a large 
number of chunks C , since the performance scales exponentially with C (but 
not too many as each chunk adds some coordination and connection overhead). 
Second, each peer should limit the number k of simultaneous uploads to other 
peers. We saw that for PTree k a good value for k is between 3 and 5. 

The results of our study also guide the design of cooperative peer-to-peer file 
sharing applications that do not organize the peers in a such a static way as do 
the linear chain or tree(s) but use a mesh instead (e.g., Bit Torrent [6]). Here, 
a peer must decide how many peers to serve simultaneously (the outdegree k) 
and what chunks to serve next (the “chunk selection strategy”). For each chunk, 
a peer selects the peer it wants to upload that chunk to (the “peer selection 
strategy”). 

Consider a peer selection strategy that gives preference to the peers that are 
closest to completion (among those that have the fewest incoming connections), 
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and a chunk selection strategy that favors the chunks that are least widely held 
in the system. Assume that each peer only accepts a single inbound connection. 
With 1 outbound connection per peer, we trivially obtain a linear chain; with 2 
outbound connections, we obtain a binary tree Tree k ~ 2 ', and so on. Failures are 
a handled gracefully as the parent of a failed peer automatically reconnects to 
the next peer in the chain or tree. 

If we now allow each peer to have k inbound and k outbound connections, we 
obtain a configuration equivalent to PTree k . Indeed, the source will fork k trees 
to which it will send distinct chunks (remember that we give preference to the 
rarest chunks). The leaves of the trees, which have free outbound capacity, will 
connect to the peers of the other trees to eventually create k parallel spanning 
trees. Such meslr-based systems, whose topology dynamically evolves according 
to predefined peer and chunk selection strategies, offer service times as low as 
the ones of PTree k and adjust dynamically to bandwidth fluctuations, bandwidth 
heterogeneity, and node failures [12]. 
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Abstract. TOPLUS is a lookup service for structured peer-to-peer networks that 
is based on the hierarchical grouping of peers according to network IP prefixes. 
In this paper we present MULTI+. an application-level multicast protocol for 
content distribution over a peer-to-peer (P2P) TOPLUS-based network. We use 
the characteristics of TOPLUS to design a protocol that allows for every peer to 
connect to an available peer that is close. MULTI+ trees also reduce the amount 
of redundant flows leaving and entering each network, making more efficient 
bandwidth usage. 



1 Introduction 

IP Multicast seems (or, at least, was designed) to be the ideal solution for content distri- 
bution over the Internet: (1) it can serve content to an unlimited number of destinations, 
and (2) it is bandwidth-wise economic. These two characteristics are strongly correlated. 
IP Multicast saves bandwidth because a single data flow can feed many recipients. The 
data flow is only split at routers where destinations for the data are found in more than 
one outgoing port. Thus n clients do not need a independent data flows, which allows 
for IP Multicast’s scalability. However, IP Multicast was never widely deployed in the 
Internet: security reasons, its open-loop nature, made IP multicast remain as a limited 
use tool for other protocols in LANs. The core Internet lacks of an infrastructure with 
the characteristics of IP Multicast. 

Lately, with the advent of broadband links like ADSL and the generalization of LANs 
at the workplace, the edges of the Internet started to increase their bandwidth. Together 
with the ever-cheaper and yet more powerful equipment (computational power, storage 
capacity), they give the millions of hosts connected to the Internet the possibility of 
implementing themselves services that augment at the application level the capabilities 
of the network: the Peer-to-Peer (P2P) systems. Various application-level multicast im- 
plementations have been proposed [1,2, 3,4, 5], most of which are directly implemented 
on top of P2P infrastructures (Chord [6], CAN [7] or Pastry [8]). The good scalability 
of the underlying P2P networks give these application-level multicast one of the proper- 
ties of the original IP Multicast service, that of serving content to a virtually unlimited 
number of clients (peers). However, these P2P networks are generally conceived as an 
application layer system completely isolated from the underlying IP network. 

Thus, the P2P multicast systems that we know may fail at the second goal of IP 
Multicast: a LAN hosting a number of peers in a P2P multicast tree may find its outbound 
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link saturated by identical data flowing to and from its local peers, unless those peers 
are somehow aware of the fact that they are sharing the same network. This is a critical 
issue for ISPs due to P2P file-sharing applications and flat-rate commercial offers that 
allow a home computer to be downloading content 24 hours a day. This problem also 
affects application-level multicast sessions if peers do not take care of the network 
topology. We have based our P2P multicast protocol on TOPLUS because of its inherent 
topology-awareness. We consider that there is a large population of peers, which justifies 
the utilization of Multicast, that many Multicast groups may coexist without interfering 
each other, and that each peer must accept to cooperate with others in low-bandwidth 
maintenance tasks, but they are not forced to transmit content that does not interest them. 

Related Work. Some examples of overlay networks which introduce topology-awareness 
in their design are SkipNet [9], Coral[10], Pastry [11], CAN [12]. Application-level 
Multicast has given some interesting results like the NICE project [1] or End System 
Multicast [2], Other application-level multicast implementations use overlay networks as 
we do to create Multicast trees: Bayeux [3], CAN [4] Pastry (Scribe) [5]. However, these 
approaches are not designed to optimize some metric like delay or bandwidth utilization. 
There is an interesting comparative in [13]. Content distribution overlay examples are 
SplitStream [14] and [15]. Recently, the problem of data dissemination on adaptive 
overlays has been treated in [16]. Our main contribution is the achievement of efficient 
topology-aware multicast trees with no or very little active measurement using distributed 
algorithms, while others aiming at similar goals require extensive probing [17], or rely 
on a much wider knowledge of the peer population [17] [18] than ours. 

In the next section we present the main aspects of TOPLUS. In Section 3 we describe 
MULTI+. In Section 4 we comment some results on MULTI+ Multicast trees properties, 
before we conclude and sketch future work in Section 5. 



2 TOPLUS Overview 

TOPLUS [19] is based on the DHT paradigm, in which a resource is uniquely identified 
by a key, and each key is associated with a single peer in the network. Keys and peers 
share a numeric identifier space, and the peer with the identifier closest to a key is 
responsible for that key. The principal goal of TOPLUS is simple: each routing step 
takes the message closer to the destination. 

Let I be the set of all 32-bit IP addresses. Let Q be a collection of sets such that 
GCJ for each G £ Q. Thus, each set G € Q is a set of IP addresses. We refer to each 
such set G as a group. Any group G £ Q that does not contain another group in Q is said 
to be an inner group. We say that the collection Q is a proper nesting if it satisfies all the 
following properties: 

1 . leg. 

2. For any pair of groups in Q , the two groups are either disjoint, or one group is a 
proper subset of the other. 

3. Each G € Q consists of a set of contiguous IP addresses that can be represented by 
an IP prefix of the form w.x.y.z/n (for example, 123.13.78.0/23). 
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The collection of sets Q can be created by collecting the IP prefix networks from 
BGP tables and/or other sources [20]. In this case, many of the sets Q would correspond 
to ASes, other sets would be subnets in ASes, and yet other sets would be aggregations 
of ASes. This approach of defining Q from BGP tables require that a proper nesting is 
created. Note that the groups differ in size, and in number of subgroups (the fanout). If 
Q is a proper nesting, then the relation G C G' defines a partial ordering over the sets 
in Q, generating a partial-order tree with multiple tiers. The set / is at tier-0, the highest 
tier. A group G belongs to tier 1 if there does not exist a G' (other than I) such that 
G C G' . We define the remaining tiers recursively in the same manner (see Figure 1). 





Fig. 1. A sample TOPLUS hierarchy (inner groups are represented Fig. 2. A simple multicast tree, 
by plain boxes) 



Peer State. Let L denote the number of tiers in the tree, let U be the set of all current 
active peers and consider a peer p G U. Peer p is contained in a collection of telescoping 
sets in Q\ denote these sets by H^{p), iTjv-i(p)) • • • ,H 0 (p) = /, where H^(jp) C 
H]y_i(p) C • • • C Ho(p) and N < L is the tier depth of p’s inner group. Except for 
Hq(p), each of these telescoping sets has one or more siblings in the partial-order tree 
(see Figure 1). Let Si(p) be the set of siblings groups of Hi(p) at tier i. Finally, let Sip) 
be the union of the sibling sets <Si (p) , • • • , Sn (p ) . 

Peer p should know the IP address of at least one peer in each group G € S (p) 
, as well as the IP addresses of all the other peers in p’s inner group. We refer to the 
collection of these two sets of IP addresses as peer p’s muting table, which constitutes 
peer p’s state. The total number of IP addresses in the peer’s routing table in tier L is 
\H L (p)\ + |<S(p)|. In [19] we describe how a new peer can join an existing TOPLUS 
network. 

XOR Metric. Each key k! is required to be an element of V , where /' is the set of all 
s-bit binary strings (s > 32 is fixed). A key can be drawn uniformly randomly from /', 
or it can be biased as we shall describe later. For a given key k' £ V , let k be the 32-bit 
suffix of k' (thus k £ I and k = k^ik^o . . . k±ko). Throughout the discussion below, we 
will refer to k rather than to the original k! . 

The XOR metric defines the distance between two IDs j and k as d(j, k) = 
Yh v =o \jv ~ fc„| • 2". The metric d(j, k) has the following properties, for IDs i,j and k: 

- If d(i, k) = d(j, k ) for any k, then i = j. 

- Let p(j , k) be the number of bits in the common prefix of j and k. If p(j, k) = m, 
d(j, k) < 2 32-m - 1. 
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- If d(i, k) < d(j, k), then p(i, k) > p(j, k). 

d(j, k ) is a refinement of longest-prefix matching. If j is the unique longest-prefix match 
with k, then j is the closest to k in terms of the metric. Further, if two peers share the 
longest matching prefix, the metric will break the tie. The peerp* that minimizes d(k , n), 
p £ U is “responsible” for key k. 

3 MULTI+: Multicast on TOPLUS 

A Multicast Tree. First we assume that all peers are connected through links providing 
enough bandwidth. A simple multicast tree is shown in Figure 2. Let S be the source 
of the multicast group to. Peer p is receiving the flow from peer q. We say that q is the 
parent of p in the multicast tree. Conversely, we say that p is a child of q. Peer p is in 
level-3 of the multicast tree and q in level-2. It is important to note that, in principle, the 
level where a peer is in the multicast tree has nothing to do with the tier the peer belongs 
to in the TOPLUS tree. 

In the kind of multicast trees we aim at building, each peer should be close to its 
parent in terms of network delay, while trying to join the multicast tree as high (close 
to the source) as possible. Each peer attempts at join time to minimize the number of 
hops from the source, and the length of the last hop. In the example of Figure 2, if p is 
a child of q and not of r, that is because p is closer to q than to r. By trying to minimize 
the network delay for data transmission between peers, we also avoid rearranging peers 
inside the multicast tree, except when a peer fails or disconnects. 

Building Multicast Trees. We use the TOPLUS network and look-up algorithm in order 
to build the multicast trees. Consider a multicast IP address to, and the corresponding 
key that, abusing the notation, we also denote m. Each tier-/ group Gi is defined by an IP 
network prefix a, /b where a, is an IP address and b is the length of the prefix in bits. Let 
rrii be the key resulting from the substitution of the first b bits of m by those of a , . The 
inner group that contains the peer responsible for to/ (obtained with a TOPLUS look-up) 
is the responsible inner group, or RIG, for to in Gi (note that this RIG is contained in 
Gi.) Hereafter, we assume a single m, and for that m and a given peer p we denote the 
RIG in Hi(jp) G tier-i simply as RIG-/ of p. This RIG is a rendezvous point for all peers 
in H dp). The deeper that a tier-/ of a RIG-/ is in the TOPLUS tree, the narrower the 
scope of the RIG as a rendezvous point (fewer peers can potentially use it). 

In the simple 3 -tier example of Figure 3, we have labeled the RIGs for a given 
multicast group (peers in grey are members of the multicast group), where all inner 
groups are at tier-3. The RIG-/ of a peer can be found following the arrows. The arrows 
represent the process of asking the RIGs for a parent in the multicast tree. For example, 
p and q share the same RIG-1 because they are in the same tier-1 group, t’s inner group 
is its RIG-1, but t would first contact a peer x (white) in its RIG-2 to ask for a parent. 
Note that this last peer is not in the multicast tree (Figure 4). 

Assume a peer p in tier-(i + 1) (i.e., a peer whose inner group is at tier-(/ + 1) of 
the TOPLUS tree) wants to join a multicast tree with multicast IP address m, which we 
call group to. 
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Fig. 3. The RIGs in a sample TOPLUS net- Fig. 4. Sample multicast tree, 

work. 



1. The peer p broadcasts a query to join group m inside its inner group. If there is a 
peer p' already part of group m, p connects to // to receive the data. 

2. If there is not such peer //, p must look for its RIG-/’. A look-up of mi inside p’s tier-i 
group (thus among p’s sibling groups at tier-(i + 1)) locates the RIG-? responsible 
for m. p contacts any peer p, in RIG-/, and asks for a peer in multicast group m. If 
peer pi knows about a peer p" that is part of m, it sends the IP address of p" to p, 
and p connects to p" . Note that p" is not necessarily a member of the RIG-/ inner 
group. In any case pi adds p to the peers listening to m, and shares this information 
with all peers in RIG-/. If p" does not exist, p proceeds similarly for RIG-(? — 1) : p 

looks up l inside p’s tier -i — 1 group (i.e., among p’s sibling groups at tier i ). 

This process is repeated until a peer receiving m is found, or RIG-1 is reached. In 
the latter case, if there is still no peer listening to m, peer p must connect directly to 
the source of the multicast group. One can see that the search for a peer to connect 
to is done bottom up. 

Property 1. When a peer p in tier i + 1 joins the multicast tree, by construction, from 
all the groups 7Tj + i(p), Hfip), • • • , H\(p) that contain p, p connects to a peer q € Hk 
where k = max{( = 1, . . . , i + 1} : 3r £ Hi and r is a peer already connected to the 
multicast tree. That is, p connects to a peer in the deepest tier group which contains both 
p and a peer already connected to the multicast tree. 

This assures that a new peer connects to the closest available peer in the network. Notice 
that even in the case of failure of a peer in a RIG-/, the information is replicated in all 
other peers in the RIG-/. If a whole RIG-? group fails, although MULTI+ is undeniably 
affected, the look up process can continue in RIG-f/ — 1) . We believe this property makes 
MULTI+ a resilient system. 

Property 2. Using multicast over TOPLUS, the total number of flows in and out of a 
group defined by an IP network prefix is bounded by a constant. 

Due to lack of space, we do not further develop this important aspect of MULTI+. We 
refer the interested reader to the Technical Report [21]. However, in the experiments 
below we will notice the tight number of flows per network prefix. 

Membership Management. Each peer p knows its parent q in the multicast tree, because 
there is a direct connection between them. Because p knows the RIG where it got its 
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parent’s address, if p's parent q in level i of the multicast tree fails or disconnects, p 
directly goes to the same RIG and asks for a new parent. If there is none, p becomes 
the new tree node at level i, replacing q. Then p must find a parent in level i — 1 of the 
multicast tree, through a join process starting at said RIG. If p had any siblings under 
its former parent, those siblings will find p as the new parent when they proceed like p. 
If more than one peer concurrently tries to become the new node at level i, peers in the 
RIG must consensually decide on one of them. It is not critical if a set of peers sharing 
a parent q are divided in two subsets with different parents upon q’s depart. 

Join and leave is a frequent process in a P2P network, but we expect the churn to 
be rather low due to the fact that in a multicast tree, all peers seek the same content 
concurrently, throughout the duration of the session. 

Parent Selection Algorithms. From the ideas exposed before, we retain two main parent 
selection algorithms for testing the construction of multicast trees. 

- FIFO, where a peer joins the multicast tree at the first parent found with a free 
connection. When a peer gets to a RIG to find a parent, the RIG answers with a list 
of already connected peers. This list is ordered by arrival time to the RIG. Obviously, 
the first to arrive connects closer (in hops) to the source. The arriving peer tests each 
possible peer in the list starting with the first one until it finds one that accepts a 
connection. 

- Proximity-aware, where, when the first parent in the list has all connections oc- 
cupied, a peer connects to the closest parent in the list still allowing one extra 
connection. 

Note that we do not always verify if we are connecting to the closest parent in the 
list. The idea behind this is that, while we implicitly trust MULTI+ to find a close parent, 
we prefer to connect to a peer higher in the multicast tree (fewer hops from the source) 
than to optimize the last hop delay. If MULTI+ works correctly, the difference between 
these two policies should not be excessive, because the topology-awareness is already 
embedded in the protocol through TOPLUS. 

4 MULTI+ Performance 

Obviously, the 0(n 2 ) cost of actively measuring the full inter-host distance matrix for n 
peers limits the size of the peer sets we can use [21]. P2P systems must be designed to be 
potentially very large, and experiments should reflect this property by using significant 
peer populations. Methods like [22] map hosts into a M -dimensional coordinate space. 
The main advantage is that given a list of n hosts, the coordinates for all of them can be 
actively measured in O(AIn) time (the distances of the hosts to a set of M landmark 
hosts, with M < n). 

TC Coordinates. CAIDA [23] offers to researchers a set of network distance mea- 
surements from so-called Skitter hosts to a large number of destinations. Skitter is a 
traffic measurement application developed by CAIDA. In a recent paper [22], the au- 
thors have used these and other data to obtain a multi-dimensional coordinate space 
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representing the Internet. A host location is denoted by a point in the coordinate space, 
and the latency between two hosts can be calculated as the distance between their cor- 
responding points. The authors of [22] have kindly provided us with the coordinates 
of 196, 297 IP addresses for our study. Hereafter we call this space the TC (from Tang 
and Crovella) coordinate space. We calculate distances using a Euclidean metric, de- 
fined D(xi,Xk) = l M^ x ik — x jk) 2 ~ for any two hosts identified by their 

M-coordinate vectors Xi and Xj. 

5000 Peers Multicast Tree. In this experiment we test the characteristics of Multicast 
trees built with MULTI+ using the TC coordinate space and a set of 5,000 peers. We 
use the coordinate space to measure the distance between every pair of hosts. In order to 
make the experiment as realistic as possible, we use a TOPLUS tree with routing tables 
of reduced size, obtained from the grouping of small and medium-sized tier-1 groups 
into virtual groups, and this process introduces a distortion in the topological fidelity 
of the resulting tree [19]. The 5,000 peers are organized into a TOPLUS tree with 59 
tier-1 groups, 2,562 inner-groups, and up to 4 tiers. We evaluate the two different parent 
selection policies described before: FIFO and proximity-aware. We also compare these 
two approaches with random parent selection. In all cases we test MULTI+ when we 
do not set a limit on the maximum number of connections a peer can accept, and for a 
limited number of connections, from 2 to 8 per peer. In the test we measure the following 
parameters, presented here using their CDF (Cumulative Distribution Function): 

- The percentage of the peers in the total system, when the full multicast tree is built, 
closer to one peer that this peer’s parent. Those figures exclude the peers directly 
connected to the source (Figure 5). 

- The level peers occupy in the multicast tree. The more levels in the multicast tree, the 
more delay we incur in along the transmission path and the more the transmission 
becomes subject to losses due to peer failure (Figure 6). 

- The latency from the root of the multicast tree to each receiving peer (Figure 7). 

- The number of multicast flows that go into and out of each TOPLUS group (network) 
(Figure 8). 

From our experiments we obtain very satisfactory results. From Figures 5 to 8 we 
draw a number of conclusions: 

- Individual peers do not need to support a large number of outgoing connections 
to benefit from MULTI+ properties: three connections are feasible for broadband 
users, and the marginal improvement of 8 connections is not very significant. 

- The proximity-aware policy performs better than FIFO in terms of end-to-end la- 
tency (Figure 7) and connection to closest parent (Figure 5). However, with respect 
to the number of flows per group (Figure 8) and level distribution in the multicast 
tree (Figure 6), they are very similar. That is because both trees follow the TOPLUS 
structure, but the proximity-aware policy takes better decisions when the optimal 
parent peer has no available connections. 

- In Figure 5(c) we can see that having no connection restrictions makes closeness to 
parent less optimal than having restrictions, for the proximity-aware policy. This is 
normal, since when we have available connections, a peer’s main goal is to connect 
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(a) Random parent selection. (b) FIFO parent selection. (c) Proximity-aware parent se- 
lection. 



Fig. 5. Percentage of peers in the whole system closer than the one actually used (for those not 
connected to the source.) 




(a) Random parent selection. (b) FIFO parent selection. (c) Proximity-aware parent se- 
lection. 



Fig. 6. Level of peers in the multicast tree. 
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(a) Random parent selection. (b) FIFO parent selection. (c) Proximity-aware parent se- 
lection. 



Fig. 7. Latency from root to leaf (in TC coordinate units) in the Multicast tree. 
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Fig. 8. Number of flows through group interface. 
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as high in the multicast tree as possible. See in Figure 6 how peers are organized in 
fewer levels, and in Figure 7 how the root- to-leaf latency is better for the unrestricted 
connection scheme. Still, we can assert that the multicast tree is following (when 
possible) a topology-aware structure, because most peers connect to nearby parents. 

- The random parent selection policy organizes the tree in fewer levels than the other 
two policies (Figure 6(a)), because connections are not constrained to follow the 
TOPLUS structure. Flowever those connections are not optimized, and the resulting 
end-to-end delay performance in any aspect is considerably poorer. 



5 Conclusion and Future Work 

We have presented MULTI+, a method to build application-level multicast trees on P2P 
systems. MULTI+ relies on TOPLUS in order to find a proper parent for a peer in the 
multicast tree. MULTI+ exhibits the advantage of being able to create topology-aware 
content distribution infrastructures without introducing extra traffic for active measure- 
ment. Admittedly, out-of-band information regarding the TOPLUS routing tables must 
be calculated offline (a simple process) and downloaded (like many P2P systems today 
require to download a list of peers for the join process). The proximity-aware scheme 
improves the end-to-end latency, and using host coordinates calculated offline and ob- 
tained at join time (as is done for TOPLUS) avoids the need for any active measurement. 
MULTI+ also decreases the number of redundant flows that must traverse a given net- 
work, even when only few connections per peer are possible, which allows for better 
bandwidth utilization. As future work, we plan to evaluate the impact of leaving and 
failing peers on the multicast tree performance, as well as comparing its properties with 
other systems. 
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Abstract. We present an instrumental approach on how to assess the 
perceptual quality of voice transmissions in IP-based communication net- 
works. Our approach is end-to-end and uses combinations of common 
codecs, loss concealment algorithms, playout schedulers, and ITU’s qual- 
ity assessment algorithms E-Model and PESQ. It is the first method that 
takes the impact of playout rescheduling and non-random packet loss 
distributions into account. Non-random packet losses occur if a rate- 
distortion optimized multimedia streaming algorithm forwards packets 
dependent on the packets’ importance. 

Our approach is implemented in open-source software. We have con- 
ducted formal listening-only tests to verify the accuracy of our quality 
model. In the majority of cases, the human test results show a high 
correlation with the calculated predictions. 

Keywords: VoIP, quality assessment, playout scheduling, rate- 
distortion optimized streaming. 



1 Introduction 

Instrumental perceptual assessment methods predict the behavior of humans 
rating the quality of multimedia streams. The ITU has standardized a psycho 
acoustic quality model called PESQ, which predicts the human rating of speech 
quality and calculates a mean opinion score (MOS) value [1]. Another model, 
the E-Model [2], evaluates the configurations of telephone systems. Among other 
factors it takes coding mode, packet loss rate, and absolute transmission delay 
into account to give an overall rating of the quality of telephone calls. Both 
models consider most sources of impairment which could occur in a telephone 
system. For example, they can predict the impact of the mean packet loss rate on 
speech quality. However, they do not consider packet losses if the loss depends on 
the packets’ content or importance. Furthermore, they cannot be directly applied 
to traces of VoIP packets, which are produced by experimental measurements 
or network simulations. 

* This work has been supported by Deutsche Forschungsgemeinschaft (DFG) via the 
AKOM program. 
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To overcome these deficiencies we have developed a systematic approach that 
combines ITU’s E-model, ITU’s PESQ algorithm, and various implementations 
of codecs and playout schedulers. Our software encodes a speech sample, ana- 
lyzes a given trace of VoIP packets, simulates multiple playout schedulers, and 
finally assesses the quality of telephone services (coding distortion, packet loss, 
transmission delay and playout rescheduling). Thus, it can determine the final 
packet loss rate, speech quality, mean transmission delay and conversational call 
quality. In this regard, we have achieved the following contributions, which this 
paper subsumes and describes as they are presented in relation to each other. 

— We developed a formula in [3] on how to include PESQ into the E-Model. 
The ITU approved this formula as a standard extension. 

— We conducted formal listening-only tests to verify the prediction perfor- 
mance of PESQ for impairments due to non-random packet losses [4] and 
playout rescheduling, which are caused by rate-distortion optimized stream- 
ing and adaptive playout scheduling respectively. The overall correlations 
are R=0.94 and R=0.87 respectively. 

— Finally we implemented the most common playout schedulers and provide 
them to the research community as open-source software [5]. Because per- 
ceptual speech quality assessment is computational complex, we provide a 
tool which runs the calculations in parallel. 

Our approach outperforms previous algorithms because it does not only consider 
the impact of playout rescheduling but also takes transmission delay, speech 
quality and non-random packet loss distribution into account. Altogether, we 
are able to predict the quality of VoIP transmissions at a high precision that has 
not been reached before. 

The paper is structured as follows: In section 2 we present the technical 
background and discuss related work. How to combine PESQ, E-Model and 
playout schedulers is explained in section 3. The next section contains the results 
of listening-only tests. Finally, in section 5, we draw conclusions. 



2 Background 

2.1 Internet Telephony 

The principle components of a VoIP system, which covers the end-to-end trans- 
mission of voice, are displayed in Fig. 1. First, at the source the analogue process- 
ing, digitalization, encoding, packetization, and protocol processing (RTP, UDP, 
and IP) are conducted. Then, the resulting packets are transmitted through the 
network, consisting of an Internet backbone and access networks. At the receiver, 
protocols process the packets and deliver them to the playout scheduler/buffer. 
In the next step, the multimedia frames are decoded and played out. Because 
telephony consists of bidirectional transmission a similar transmission is pre- 
sented in the reverse direction. In the following paragraphs we will discuss some 
components in more detail to show how they cause the service quality to degrade. 
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Fig. 1 . VoIP transmission of a telephone call 



Network: On the Internet and in the access networks packets can get lost 

because of congestion or (wireless) transmission errors. The packet loss process 
can be controlled to optimize the perceived service quality: 

Chou et al. [6] suggest to forward multimedia packets according to their esti- 
mated distortion and error propagation. He proposed a rate-distortion optimized 
multimedia streaming framework for packetized and lossy networks. 

De Martin [7] proposed an approach called Source-Driven Packet Marking, 
which controls the priority marking of speech packets in a DiffServ network. 
If packets are assumed to be perceptually critical, they are transmitted at a 
premium traffic class. 

Sanneck used a modified Random-Early-Dropping (RED) at packet forward- 
ing nodes [8]. If a node is congested, the probability of packet dropping should 
depend on the packet markings. Additionally, Sanneck proposes to mark G.729 
coded voice packets according to their estimated importance. 

All three algorithms handle packets in a content-sensitive manner. Therefore 
dropping of packets might depend on their marking and content. Thus, it is 
inadequately to measure only the mean packet loss rate for predicting the 
speech quality. 

Playout scheduler: At the receiver, a playout buffer stores packets so that they 
can be played out in a time-regular manner, concealing variations in network 
delay (jitter). As the playout buffer contributes to the end-to-end delay it should 
not store packets longer than necessary. Instead, the playout buffer should drop 
packets that arrive too late to be played out at the scheduled time. 

The playout scheduling can be static: If packets exceed a given transmission 
time they will be discarded (we will refer to this scheme as fixed playout buffer). 
Alternatively adaptive playout buffers re-define the playout time in accordance 
to the delay process of the network [9,10]. We refer to this kind of adaptation 
as rescheduling. The playout schedule can be adjusted easily during silence be- 
cause then it is not notable. Adjustments during voice activity require more 
sophisticated concealment algorithms [11]. 
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2.2 Quality Assessment 

The perceived quality of a service can be measured with subjective tests. Humans 
evaluate the quality of service according to a standardized quality assessment 
process [12]. Often the quality is described by a mean opinion score (MOS) 
value, which scales from 1 (bad) to 5 (excellent). Listening- only tests are time 
consuming. Especially if many tests have to be made, the effort of subjective 
evaluation is prohibitive. Fortunately, in the last years, considerable effort has 
been made to develop instrumental measurement tools, which predict the human 
rating behavior. We will explain shortly the approaches used in this paper. 

The perceptual assessment of speech quality (PESQ) algorithm predicts hu- 
man rating behavior for narrow band speech transmission [1], It compares an 
original speech fragment with its transmitted and thus degraded version to de- 
termine an estimated MOS value. For multiple known sources of impairment 
(typical for analogue, digital and packetized voice transmission systems) it shows 
a high correlation (about 0.94) with human ratings. 

The quality of a telephone call cannot be judged by the speech quality alone. 
The ITU E-Model [2] additionally considers end-to-end delay, echoes, side-tones, 
loudness and other factors to calculate the so called R-factor. A higher R-factor 
corresponds to a better telephone quality, being 0 the worst value, 70 the minimal 
quality of telephone calls (“toll quality”), and 100 the best value. 



2.3 Related Work 

Markopoulou et al. [13] measured the performance of a couple of Internet back- 
bone links and analyzed them with ITU’s E-Model. Their findings include not 
only that the quality of VoIP depends largely on the provider’s link quality but 
also on the playout buffer scheme. 

Hammer et al. [14] suggest to use PESQ to assess the speech quality of a 
VoIP packet trace. He proposes to split the trace into overlapping subparts. 
The benefit of this approach is that different coding schemes and also packet 
marking algorithms can be judged. Also, FEC or different playout schedulers 
can be supported in principle. 

An approach that also considers interactivity is presented by Sun and Ifea- 
clror [15]. The authors suggest to combine the E-Model and PESQ and describe 
a set of equations, which they derived by linear approximations to the rating 
behavior of PESQ, E-Model and the correlation between packet loss rate and 
speech quality. 

3 Combining E-Model, PESQ, and Playout Schedulers 

Considering the characteristics of VoIP packet transmissions on the one side and 
the capability of perceptual models on the other side, we identify the following 
aspects as incomplete: 
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Fig. 2. Speech and delay assessment 



— Perceptual quality assessment has to take into account the entire processing 
chain from source to sink, including encoding, routing across the Internet, 
de-jittering, decoding and playing at the receiving side, because only this 
reflects human-to-human conversation. Thus, when studying a transmission 
of VoIP packets the entire transmission system has to be considered. 

— The end-to-end quality depends largely on the playout buffer scheme [13]. 
However, until now an “ideal” playout scheduler has not been identified and 
any implementer of a VoIP phone is free to choose any scheme. Thus, to 
predict the impact of playout scheduling one has to consider all, and if not 
possible, the most common playout schedulers. 

— Rescheduling of adaptive playout schedulers harms the speech quality be- 
cause of temporal discontinuities. The E-Model does not take the dynamics 
of a transmission into account because it relies on static transmission pa- 
rameters. PESQ instead considers playout adaptation but does not include 
the absolute delay into its rating algorithm. PESQ has been designed to 
judge the impact of playout scheduling but has not been validated yet for 
this purpose [16]. 

— The E-Model does not consider non-random packet losses and PESQ has 
not been verified for this kind of distortion and the prediction accuracy is 
unknown. 

To overcome these shortcomings we combine the E-Model, PESQ and playout 
schedulers as shown in Fig. 2: First, a set of the most common playout scheduler 
schemes (including fixed-deadline and adaptive algorithms [9,10] of Van Jacob- 
sen, Mills, Schulzrinne, Ramjee, and Moon) calculates the packets’ playout times 
and the mean transmission delay. One should note that only speech frames dur- 
ing voice activity are considered because during silence a human cannot identify 
the transmission delay. 1 

Next, PESQ calculates the speech quality that depends on coding distortion, 
non-random packet loss and playout rescheduling. Because PESQ has not been 
verified for non-random packet loss and playout rescheduling, we conduct formal 
listening tests to verify its accuracy (see section 4). Last, both the speech quality 
and the mean transmission delay are fed into the E-Model. We assume the 

1 Indeed, some playout schedulers change the playout time at the start of a talk spurt. 
Others change it at the beginning of silence periods. Both have to be considered as 
equal with respect to the transmission delay. 
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acoustic processing as optimal [3]. Therefore we can simplify the E-Model to a 
model with only few parameters. The computation of Rf actor is then given by: 

Rf actor = MOS 2 R (MOSpeSq) ~ Idd. ( t ) (1) 

Reference [3] describes the function MOS 2 R. and the conditions under which (1) 
can be applied. For a definition of the function I^d we like to refer to [2]. 

Software Package: We have implemented the approach discussed above and 

provide it as open-source to the research community. The software covers the 
digital processing chain of VoIP. To be fully operational, the PESQ algorithm 
and a G.729 codec have to be bought from its rights owners. Alternatively they 
can be downloaded at no costs from ITU’s web page for trials only. Further 
information can be found at our web page and in the manual [5]. 

We try to verify the correctness of our software by several means. Publishing 
of this software together with its source code ensures that more users are going 
to use it and to study its code. Thus, the pace of finding potential errors will 
be increased. Last not least, we have tested our tool-set on various projects 
which include the assessment of voice over WLAN, the impact of handover 
and wireless link scheduling. Overall, we are confident of the correctness of our 
implementation. 

4 Listening-Only Tests 

PESQ has not been verified for all causes of impairment. In [4] we have con- 
ducted formal listening-only test to determine PESQ’s prediction performance 
in cases of single or non-random packet losses. In this section we verify whether 
PESQ can measure the impairment of playout rescheduling. This verification is 
important because PESQ was not designed for this kind of impairments and 
operates outside the scope of its operational specification [16]. 

To verify PESQ, we construct artificially degraded samples and conduct both 
listening-only tests and instrumental predictions. If both PESQ and human tests 
yield similar results for the samples, PESQ is verified. Usually, the results of 
speech quality tests are compared via correlation. Thus, the amount of correla- 
tion (R) between subjective and objective speech quality results is our measure 
of similarity. R=1 means that the results are perfectly related. If no correlation 
is present, R is equal to zero. To compare absolute subjective and instrumental 
MOS values, we apply linear regression to one set of values which is a usual 
practice. The correlation R does not change after linear regression. 

Sample Design: Analysis of Internet traces has shown that sometimes packet de- 
lays show a sharp, spike-like increase [9,13] which cannot be predicted in advance. 
Delay spikes are a short increase of the packet transmission times which usually 
occur after congestion or on a wireless link after fading. Soon after the spike 
the following packets arrive shortly one after the other until the transmission 
delay has returned to its normal value (Fig. 3). We like to consider the question 
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Fig. 3. Delay spike 

whether to adjust the playout of speech frames to delay spikes by concentrating 
on the non-trivial case of delay spikes during voice activity. 

For constructing the samples we have used the software package described 
in this paper. It generates artificial packet traces that contain delay spikes. One 
can control the frequency, the height and weight of delay spike. Further, three 
different playout strategies are analyzed: First, we drop every packet that is 
affected by the spike and thus arrives too late. Second, in case of a delay spike, 
the playout is re-scheduled so that no packet will be dropped. As a consequence, 
the playout delay will be increased. The last strategy is similar to the second, 
but after any spikes, the playout delay is adjusted during silence periods until 
the playout delay returns to normal. 

We construct 220 samples (length approx. 5-10s), containing samples encoded 
with G.711, G.729 and containing one delay spike with a height of 50 to 300ms 
and a width of 55 to 330ms. 

Formal Listening Tests: The listening-only tests followed closely the ITU recom- 
mendations [12], Appendix B, that describes methods for subjective assessment 
of quality. The tests took place in a professional sound studio (46 m 2 , low en- 
vironmental noise, etc.). Nine persons judged the quality of 164 samples. The 
samples’ language is German, which all listeners understand. 

We do not follow the ITU’s recommendations when scientific results suggest 
changes that improve the rating performance. For example, we have used high 
quality studio headphones instead of an Intermediate Reference System, because 
headphones have a better sound quality. Further, multiple persons were in the 
room at the same time to reduce the duration of the experiment. 

Last but not least we do not apply the “Absolute Category Rating” because 
a discrete MOS makes it difficult to compare two only slightly different samples. 
The impact of a single frame loss is indeed very small. We allow intermediate val- 
ues and use a linear MOS scale. PESQ calculates a MOS value with a resolution 
of up to 10 -6 at the MOS scale. 

Results: Ten persons gave a total of 2210 judgments. We could use only 2033 
judgments because some test persons failed to get track with the sample number. 
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Fig. 4. Playout strategy: delay spike height vs. speech quality. 



The rating performance during the second half of the test was significantly worse 
than during the first half. We also compared a group of native speakers and a 
group of foreign students. Both have shown a similar rating performance. This 
leads us to the conclusion that being concentrated is more important than being 
a native speaker. 

Figure 4 displays the speech quality versus the spike height. We show the 
rating results of humans and PESQ for different adaptation policies. The black 
lines (drop) refer to the dropping of any late packet during the delay spike. The 
blue lines (adapt) display the results when delaying the playout after a delay 
spike. Last, the red lines (adapt&fallback) include the effect of falling back to 
the original playout time as soon as possible. The later rescheduling occurs only 
during periods of silenced speech. 



Analysis: In the experiments the sample content is varied for each different delay 
spike height. Because the sample content has a large influence on the speech 
quality ratings, one cannot compare absolute MOS values on the horizontal axis 
(that displays the delay spike height) in Figure 4. However, the playout strategies 
can be ranked against each other if the delay spike height remains the same. 

If the delay-spike’s height is 200 ms or larger, dropping packets is more benefi- 
cial than delaying the playout. Further, the “fallback” adjustments during silence 
degrade the speech quality. The “adapt” algorithm performs always better. 

Table 1 displays the prediction performance of PESQ. The overall correla- 
tion is R = 0.866. If one considers only samples that contain modulated noise 
(MNRU), the correlation is nearly perfect ( R = 0.978). Also, we identify a 
coherency between the MOS variance and the correlation. For a given sample 
set, the more the samples differ the higher is the correlation. Considering the 
“drop” strategy for example, both the PESQ variance and the correlation are 
low. We assume that humans cannot distinguish degraded samples which are 
only slightly different. 

Comparing the absolute MOS values in Table 1, one can see that there exists 
a constant offset between instrumental and subjective MOS values. As we can 
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Table 1. Listening-only test: delay spikes 



Selection 




MNRU 


Coding 


Spike Height 


Playout strategy 


criteria: 


all 


yes 


G.711 G.729 


100ms 200ms 300ms 


drop adapt &fallback 



Samples 

MOS 

PESQ MOS 
PESQ var. 


113 

2.518 

2.280 

0.723 


13 

3.013 

2.823 

1.015 


33 63 

2.565 2.284 
2.277 2.028 
0.564 0.373 


15 18 18 

2.844 2.228 2.280 
2.680 1.873 1.998 
0.882 0.141 0.246 


32 32 32 

2.220 2.453 2.469 

2.039 2.243 2.058 

0.088 0.717 0.541 


Correlation 


0.866 


0.978 


0.856 0.668 


0.906 0.737 0.768 


0.476 0.838 0.799 



not understand the reasons for this offset, we assume that it can be explained 
due to the social behavior and emotions of our listening personal. Their rat- 
ings are severer or more indulgent as compared to the ratings used during the 
development of PESQ. 

5 Conclusions 

In this paper we have presented an approach on how to assess the quality of 
VoIP transmissions. We identify important sources of quality degradation that 
can occur in a VoIP system; especially the impact of playout rescheduling and 
non-random packet losses has not been considered in previous approaches. 

We combine PESQ, the E-model, different coding schemes and playout sched- 
ulers to analyze VoIP packet traces. The ITU approved the mathematical com- 
bination of PESQ and E-Model as a standard extension. 

PESQ is verified with formal listening-only tests to identify its prediction 
accuracy 2 . The listening-only tests lead to manifold results. They show that 
PESQ indeed predicts in general the speech quality well. However, we identified 
that in the same cases, PESQ has to be improved. These improvements are 
beyond the scope of this paper. To enable other researchers the verification as 
well as the tuning of their algorithms, the complete experimental data including 
all samples and ratings are available on request. 

Beyond the scope of this paper are also various performance evaluations that 
our approach enables. For example, the assessment of playout schedulers can be 
used to identify the ideal one. Also, Internet backbone traces can be assessed and 
novel VoIP over WLAN systems can be developed. Especially, if the importance 
of speech frames is utilized [4] and non-random packet losses are enforced, we will 
show that the performance of VoIP over wireless can be enhanced significantly. 
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Abstract. In this paper, we present an analysis of the impact of using 
media-dependent Forward Error Correction (FEC) in VoIP flows over 
the Internet. This error correction mechanism consists of piggy-backing 
a compressed copy of the contents of packet n in packet n + i (i being 
variable), so as to mitigate the effect of network losses on the quality 
of the conversation. To evaluate the impact of this technique on the 
perceived quality, we propose a simple network model, and study different 
scenarios to see how the increase in load produced by FEC affects the 
network state. We then use a pseudo-subjective quality evaluation tool 
that we have recently developed in order to assess the effects of FEC 
and the affected network conditions on the quality as perceived by the 
end-user. 



1 Introduction 

In recent years, the growth of the Internet has spawned a whole new generation 
of networked applications, such as VoIP, videoconferencing, video on demand, 
music streaming, etc. which have very specific, and stringent, requirements in 
terms of network QoS. In this paper we will focus on VoIP technology, which has 
some particularities with respect to other real-time applications, and it is one of 
the most widely deployed to date. The current Internet infrastructure was not 
designed with these kinds of applications in mind, so multimedia applications’ 
quality is very dependent on the capacity, load and topology of the networks 
involved, as QoS provisioning mechanisms are not widely deployed. Therefore, 
it becomes necessary to develop mechanisms which allow to overcome the tech- 
nical deficiencies presented by current networks when dealing with real-time 
applications. 

Voice-over-IP applications tend to be sensitive to packet losses and end 
to-end delay and jitter. In this paper we will concentrate on the effect of FEC 
on packet loss, and the effect of both on the perceived quality. While it has 
been shown [1] that delay and jitter have a significant impact on the perceived 
quality, we will focus on one-way flows, whose quality is largely dominated (at 
the network level) by the packet loss process found in the network. The effects of 
FEC on interactive (two-way) VoIP applications is the subject of future studies. 
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In order to assess the variations in perceived quality due to the use of FEC, 
we will use a technique we have recently developed [2,3]. The idea is to train an 
appropriate tool (a Random Neural Network, or RNN) to behave like a “typical” 
human evaluating the streams. This is not done by using biological models of 
perception organs but by identifying an appropriate set of input variables related 
to the source and to the network, which affect the quality, and to teach the RNN 
the relationships between these variables and the perceived quality. One of the 
main characteristics of this approach is that the result is extremely accurate (as 
it matches very well the result obtained by asking a team of humans to evaluate 
the streams). In [4], we applied this method to analyze the behavior of audio 
communications on IP networks, with very good results after comparison with 
real human evaluations. 

The rest of the paper is organized as follows. Section 2 presents the tool we 
used to assess the perceived quality of the flows. Section 3 presents the network 
model we used for our analysis, and the effects of adding FEC to the audio 
traffic. In Section 4, we present our analysis of the effects of FEC on the quality 
of the flows. Finally, Section 5 presents our conclusions. 



2 Assessing the Perceived Quality 

Correctly assessing the perceived quality of a speech stream is not an easy task. 
As quality is, in this context, a very subjective concept, the best way to evaluate 
it is to have real people do the assessment. There exist standard methods for 
conducting subjective quality evaluations, such as the ITU-P.800 [5] recommen- 
dation for telephony. The main problem with subjective evaluations is that they 
are very expensive (in terms of both time and manpower) to carry out, which 
makes them hard to repeat often. And, of course, they cannot be a part of an 
automatic process. 

Given that subjective assessment is expensive and impractical, a significant 
research effort has been done in order to obtain similar evaluations by objective 
methods, i.e., algorithms and formulas that measure, in a certain way, the qual- 
ity of a stream. The most commonly used objective measures for speech/audio 
are Signal-to-Noise Ratio (SNR), Segmental SNR (SNRseg), Perceptual Speech 
Quality Measure (PSQM) [6], Measuring Normalizing Blocks (MNB) [7], ITU 
E-model [8], Enhanced Modified Bark Spectral Distortion (EMBSD) [9], Per- 
ceptual Analysis Measurement System (PAMS) [10] and PSQM+ [11]. These 
methods have three main drawbacks: (i) they generally don’t correlate well with 
human perception [12,8]; (ii) virtually all of them (one exception is the E-model) 
are comparing techniques between the original and the received stream (so they 
need the former to perform the evaluation, which precludes their use in a live, 
real-time networking context), and (iii) they generally don’t take into account 
network parameters. Points (ii) and (iii) are due to the fact that they have been 
mainly designed for analyzing the effect of coding on the streams’ quality. 

The method used here [2,3] is a hybrid between subjective and objective 
evaluation. The idea is to have several distorted samples evaluated subjectively, 
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and then use the results of this evaluation to teach a RNN the relation between 
the parameters that cause the distortion and the perceived quality. In order for 
it to work, we need to consider a set of P parameters (selected a priori) which 
may have an effect on the perceived quality. For example, we can select the codec 
used, the packet loss rate of the network, the end-to-end delay and/or jitter, etc. 
Let this set be V = (7Ti, — . ,7 rp}. Once these quality-affecting parameters are 
defined, it is necessary to choose a set of representative values for each 77, with 
minimal value 7r m i n and maximal value 7r max , according to the conditions under 
which we expect the system to work. Let {pn, ■ ■ ■ ,PiHi\ be this set of values, with 
7T m i n = pn and 7r max = pi H t • The number of values to choose for each parameter 
depends on the size of the chosen interval, and on the desired precision. For 
example, if we consider the packet loss rate as one of the parameters, and if we 
expect its values to range mainly from 0% to 5%, we could use 0, 1, 2, 5 and 
perhaps also 10% as the selected values. In this context, we call configuration a 
set with the form 7 = {iq, . . . , vp}, where 'ty is one of the chosen values for pi. 

The total number of possible configurations (that is, the number n^Li Hi) 
is usually very large. For this reason, the next step is to select a subset of the 
possible configurations to be subjectively evaluated. This selection may be done 
randomly, but it is important to cover the points near the boundaries of the 
configuration space. It is also advisable not to use a uniform distribution, but 
to sample more points in the regions near the configurations which are most 
likely to happen during normal use. Once the configurations have been chosen, 
we need to generate a set of “distorted samples”, that is, samples resulting 
from the transmission of the original media over the network under the different 
configurations. For this, we use a testbed, or network simulator. 

Formally, we must select a set of M media samples (cr m ), m = l,--- , M, 
for instance, M short pieces of audio (subjective testing standards advise to use 
sequences having an average 10 sec length -following [5], for instance). We also 
need a set of S configurations denoted by {71, • • • , 75} where •j s = (v a i, ■ ■ ■ , v s p), 
v sp being the value of parameter ir p in configuration j s . From each sample cy , 
we build a set {cyi, • • • , cys} of samples that have encountered varied conditions 
when transmitted over the network: sequence cy s is the sequence that arrived at 
the receiver when the sender sent cy through the source-network system where 
the P chosen parameters had the values of configuration y s . 

Once the distorted samples are generated, a subjective test [5] is carried 
out on each received piece cy s . After a statistical screening of the answers (to 
eliminate “bad” observers) , the sequence cy s receives the value fi ts (often, this is 
a Mean Opinion Score, or MOS), the average of the values given to it by the set 
of observers. The idea is then to associate each configuration with the value 
p s = (1/M) Urns- 

At this step we have a set of S configurations 71 , . . . ,7 5, and we associate 
p s with configuration j s . We randomly choose S\ configurations among the S 
available. These, together with their values, constitute the “Training Database”. 
The remaining £2 = S — Si configurations and their respective values constitute 
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the “Validation Database”, reserved for further (and critical) use in the last step 
of the process. 

The next step is to train a specific statistical learning tool (a RNN) to 
learn the mapping between configurations and values as defined by the Training 
Database. Assume that the selected parameters have values scaled into [0,1] and 
the same with quality. Once the tool has “captured” the mapping, that is, once 
the tool is trained, we have a function /() from [0, l] p into [0, 1] mapping now any 
possible value of the (scaled) parameters into the (also scaled) quality metric. 
The last step is the validation phase: we compare the value given by /() at the 
point corresponding to each configuration j s in the Validation Database to fx s ; if 
they are close enough for all of them, the RNN is validated (in Neural Network 
Theory, we say that the tool generalizes well). In fact, the results produced by 
the RNN are generally closer to the MOS than that of the human subjects (that 
is, the error is less than the average deviation between human evaluations). As 
the RNN generalizes well, it suffices to train it with a small (but well chosen) 
part of the configuration space, and it will be able to produce good assessments 
for any configuration in that space. The choice of the RNN as an approximator is 
not arbitrary. We have experimented with other tools, namely Artificial Neural 
Networks, and Bayesian classifiers, and found that RNN are more performant in 
the context considered. ANN exhibited some performance problems due to over- 
training, which we did not find when using RNN. As for the Bayesian classifier, 
we found that while it worked, it did so quite roughly, with much less precision 
than RNN. Besides, it is only able to provide discrete quality scores, while the 
NN approach allows for a finer view of the quality function. 

For this study, we will use a RNN trained with results from a subjective 
tests campaign carried out with 17 subjects. The subjects were presented 115 
sets of speech samples that had been generated using the Robust Audio Tool 
(RAT [13]), corresponding to different network and coding configurations. A 
MOS test was performed and the results screened as per [5]. About 90 of the 
results obtained were used to train the RNN, and the remaining ones were used 
to validate it. The parameters considered for our experiment are listed on table 1, 
and are described below. 

Codec - the primary codec (16 bit linear PCM, and GSM), 

FEC - the secondary codec (GSM), if any, 



Table 1. Network and encoding parameters and values used 



Parameter 


Values 


Loss rate 


0 % ... 15% 


Mean loss burst size 


1 . . . 2.5 


Codec 


PCM Linear 16 bits, GSM 


FEC 


ON(GSM)/OFF 


FEC offset 


1 ... 3 


Packetization interval 


20, 40, and 80ms 
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FEC offset - the offset, in packets, of the redundant data (we used offsets of 
1 and 3 packets for the Forward Error Correction (FEC) scheme presented 
in [14,15]. 

Packetization Interval (PI) - the length (in milliseconds) of audio contained 
in each packet (we considered packets containing 20, 40 and 80ms). 

Packet loss rate the percentage of lost packets (2, 7, 10 and 15%). 

Mean loss burst size the average size of packet loss bursts (1, 1.7 and 2.5 
packets) ; we consider this parameter to have a finer view on the packet loss 
process than the one reduced to the packet loss rate only. 

So, in order to obtain an estimation of the perceived quality, all that is needed 
is to feed the trained RNN with values for those parameters, and it will output 
a MOS estimation, very close to the actual MOS. 

3 The Network Model 

The tool described in the last section gives us a way to explore the perceived 
quality as a function of the 6 selected parameters. For instance, this allows to 
plot MOS against, say, packet loss rate in different cases (parameterized by the 
5 remaining parameters), etc. For performance evaluation purposes, we want 
to know which is the impact on quality of typical traffic parameters (such as 
throughputs) and the parameters related to the dimensions (such as windows, 
buffer sizes, etc.). This gap is here bridged by adding a network model. 

In this paper we will consider a very simple network model, much like the 
one presented in [16,17]. It consists of an M/M/1/ H queue which represents the 
bottleneck router in the network path considered. In spite of its simplicity, this 
model will allow us to capture the way FEC affects perceived quality. Moreover, 
it appears to be quite robust (see the comments in 3.3). We will concern ourselves 
with two classes of packets, namely audio packets and background traffic. Audio 
packets can have FEC or not, but we will consider that if FEC is on, then all 
flows are using it. Our router will have a drop-tail buffer policy, which is common 
in the current Internet. 



3.1 Transmission Without FEC 

First, consider the case of audio without FEC. We will take the standard en- 
vironment in the M/M/l/H case: Poisson arrivals, exponential services, and 
the usual independence assumptions. The arrival rates of class i units is Aj pps 
(packets per second) and the link has a transmission capacity of c bps. The 
average packet length for class i packets is B t bits. In order to be able to use 
analytical expressions, we consider in the model the global average length of the 
packets sharing the link, B , given by B = a\B\ + 02-82, where a,: = Aj/A, with 
A = Ai + A2. The service rate of the link in pps is then p = c/B. 

Let us assume that the buffer associated with the link has capacity equal 
to N bits. Then, in packets, its capacity will be taken equal to H = N/B. To 
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simplify the analysis, we will use the expressions for the performance metrics of 
the M/M/l/H models even if H is not an integer; this does not significantly 
change the results and the exposition is considerably more clear. 

If we denote g = A///, then the packet loss probability p is 



P = 



1 - 8 
1 - g H+1 



e 



H 



(assuming g / 1). We also need to compute the mean size of loss bursts for 
audio packets, since the correlation between losses influences the speech quality. 
Here, we must discuss different possible definitions for the concept of loss burst, 
because of the multi-class context. To this end, let us adopt the following code 
for the examples we will use in the discussion: a chunk of the arrival stream at 
the link will be denoted by “. . . ,x,y,z,... ” where symbols x, y, z, . . . are equal 
to % if a class-* packet arrives and is accepted (the queue was not full) and to * if 
the arrival is a class-* one and it is rejected (because there was no room for it). 

Assume a packet of class 1 arrives at a full queue and is lost, and assume 
that the previous class 1 packet was not lost. Then, a burst of class 1 losses 
starts. Strictly speaking, the burst is composed of a set of consecutive audio 
packets all lost, whatever happens to the class-2 packets arriving between them. 
For instance, in the path “. . . , 1, 2, T, T, 2, 2, 1, 2, 2, 2, 1, . . . ” there is a class 1 
burst with loss burst size size LBS = 3. If we use this definition of burst, when 
audio packets are a small fraction of the total offered traffic, this definition can 
exaggerate the effective impact of correlation. Even in the case of many audio 
packets in the global arrival process, allowing the class-2 packets to merge inside 
audio loss bursts can be excessive. On the other extreme, we can define an 
audio loss burst as a consecutive set of class-1 arrivals finding the buffer full, 
without class-2 packets between them. In the path shown before, if we consider 
this burst definition, there is a first burst of class-2 losses with size 2, then 
a second one composed of only one packet. Consider now the piece of path 
“. . . , 1, 2, 2, T, 1, 1, 2, 2, T, 2, 2, 2, T, T, 1, . . An intermediate definition consists 
of considering that we accept class-2 packets inside the same audio burst only 
if they are also lost (because this corresponds basically to the same congestion 
period). In the last example, this means that we have a loss burst with size 4. 
We will keep this last definition for our analysis. 

Let us denote by LBS the loss burst size (recall that we focus only on class 1 
units). The probability that a burst has size strictly greater than ?* is the proba- 
bility that, after a class 1 loss, the following n class- 1 arrivals are losses, accepting 
between them class-2 losses in any number. This means 



Pr(LBS > n) = p n , 



where 



■ = £ 
k>0 



A? 



Ai 



Ai 



Ai + A 2 + /v Ai + A 2 H - M Ai + /i 



The last relationship comes from the fact that we allow any number of class 2 
units to arrive as far as their are lost, between two class 1 losses (that is, while 
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the burst is not finished, no departure from the queue is allowed). Using the 
value of p, we have 



E(LBS) = ^Pr(LBS > n) = ^ 



n> 0 



n> 0 



Ai 

Ai + /i 



= i + U 

ft 



3.2 Transmission With FEC 

If FEC is used, each audio packet has supplementary data, and we denote this 
overhead by r. If B' is the mean audio packet length when FEC is used, then 
B[ = B\ (1 T r) . The rest of the parameters are computed as before. We have 
B' = a\B[ + « 2 ^ 2 , p' = c/B' , H' = N/B' and g' = X/p'. This leads to the 
corresponding expression for the loss probability p 1 = g' H (1 — £>') / (1 — g ,H +1 ) 
and E(LBS') = 1 + X\j p! . 



3.3 About the Model Robustness 

The simplicity of the classical M/M/l/iT single class model hides its capacity a 
capture the dynamics of the system. We also explored the more directly multi- 
class M/M/l/H FIFO queue where the service rate is pi for class-? packets. This 
model can be easily numerically analyzed by writing and solving the associated 
equilibrium equations. 

Let us denote 

qi = Pr(in steady state, the queue is saturated and a class-?' packet is being 
transmitted) . 

Then, using the previously definition of class-A loss bursts, their average length 
E(LBSfc) is derived exactly as before. First, conditioned on the class of the packet 
being transmitted, 

E(LBSfc | a class-? packet is being transmitted) = 1 + A k/ Pi- 

Then, given the low loss rate considered, we average the average loss burst length 
for class-A: as 

E(LBS fc ) = + A k/Pi) _ 

We explored the perceived quality as described before within this multi-class 
model and the numerical results we obtained were very similar to those presented 
in next section. 



4 The Impact of FEC on VoIP Flows 

The main idea of using FEC is that as the real-time requirements of VoIP make 
it impossible to retransmit lost packets, it is possible to mitigate the effect of 
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the losses by transmitting the information contained in each packet more than 
once. To this end, we send one (maybe more) copy of packet n’s contents in 
packet n + i (n + i + 1, and so on if several copies are sent), i being variable. 
The extra copies are normally compressed at a lower bit rate, so as to minimize 
the increase in bandwidth usage. When a packet is lost, if the packet containing 
the redundant copy of the lost one arrives in a timely fashion, the receiver can 
use said copy to recover from the loss, in a way that minimizes the perceived 
quality degradation. The variability of i is due to the bursty nature of packet 
losses on IP networks [18,19], and it allows to improve the performance of the 
FEC scheme described above by avoiding the loss of all copies of a given voice 
packet in the same loss burst. However, i should remain as close to 1 as possible, 
in order to minimize the increase of the end-to-end delay, which is known to 
degrade the quality of a two-way VoIP session. 

In our study, we focus on the perceived audio quality, and so it is sufficient 
to consider only one-way flows. Therefore, we didn’t consider the effects of FEC 
on the end-to-end delay. 

In order to assess the impact on FEC at the network loss level, we must 
consider two factors: 

— the amount of redundancy introduced by the use of FEC (and therefore the 
increase in packet size for the flows using FEC), 

— the proportion of flows using FEC, which allows us to derive the new network 
load. 

While the increase in size is likely to be small, more or less equal for different 
applications, and certainly bounded by a factor of 1, the amount of flows using 
FEC is very hard to determine. Estimations of the number of VoIP flows on the 
Internet are not readily available in the literature, but there are some estima- 
tions [20] that say that UDP traffic only accounts for about 15 to 20% of the 
total network traffic. Even then, VoIP is probably still a modest fraction of that 
UDP traffic (some studies [21] suggest that about 6% of traffic corresponds to 
streaming applications). However, being that VoIP applications have a growing 
user base, it seems reasonable that in some time they may account for a higher 
fraction of the total Internet traffic. 

We studied different scenarios, with increasing volumes of VoIP traffic in or- 
der to assess the impact of FEC in the quality perceived by the end-user. For sim- 
plicity’s sake, we assumed that if FEC is in use, then all VoIP flows are using it. 

For our example, we’ll consider an T3-type line which runs at 45Mbps (ac- 
tual speeds are not really relevant, since we will concern ourselves with the load 
of the network - numbers are used to better set the example). In practice, buffer 
space in core routers is not limited by physical memory, but rather by the admin- 
istrators, who may wish to minimize delays (while this implies that the loss rate 
is higher than it could be, it is important to make TCP flows behave properly). 
This time is normally limited to a few milliseconds [22]. We will use a value of 
200ms (which requires a buffer space of N = 45Mbps * 0.2s = 9Mbits, or about 
1965 packets of 600B), which is on par with current practices. 
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Table 2. Packet size distribution 



Size (Bytes) 


Probability 


40 


50 


512 


0.25 


1500 


0.24 


9180 


0.01 



Some studies [23] show that an important percentage (about 60%) of packets 
have a size of about 44B, and that about 50% of the volume of bytes transferred 
is transferred on 1500B or higher packet sizes. Table 2 shows the distribution 
we chose for our background traffic packet sizes, which yields an average packet 
size B 2 of about 600B. 

We considered that network load varies between 0.5 and 1.15. Smaller values 
have no practical interest, since in the model we used, they result in negligible 
loss rates, and higher load values result in loss rates above 15%, which typically 
yield unacceptable quality levels. 

As for the fraction of VoIP traffic, we studied different values, between 5% 
and 50% of the total packet count. Granted, the higher values are currently 
unrealistic, but they may make sense in a near future if more telephony providers 
move their services toward IP platforms, and other VoIP applications gain more 
acceptance. 

We chose to use PCM audio with GSM encoding for FEC, and an offset 
of one packet for the redundant data. This is a bit pessimistic, since sample- 
based codecs are not very efficient for network transmission, but it allows us to 
get a worst-case scenario of sorts (since increasing the fraction of audio traffic 
does indeed increase network load). In this case, the increase in packet size is 
of about 10% when using FEC, which gives payloads of 353B, against 320B of 
PCM-only traffic for 20ms packets. We also tried GSM/GSM, which results in 
a much smaller packet size (66B and 33B with and without FEC respectively), 
but found that the results are qualitatively similar to those of PCM, and so we 
will discuss only the PCM ones. 



4.1 Assessing the Impact of FEC on the Perceived VoIP Quality 

We present our results as a series of curves, plotting the estimated MOS values 
against network load, and for different values of the proportion of voice packets. 

As can be seen in the curves in Figure 1, using FEC is beneficial for the 
perceived quality in all the conditions considered. It can be seen that when the 
proportion of voice traffic becomes important (> 30%) the performance of the 
FEC protection decreases. We believe that this is related to the fact that a 
higher proportion of voice packets implies a higher mean loss burst size (which 
is still smaller than 2 in our model for the conditions considered), and being 
that we chose an offset of 1 packet for the FEC, it is logical to see a slight 





Network Load 



Fig. 1 . MOS as a function of network load for different fractions of voice traffic, with 
and without FEC (20ms packets) 
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decrease in performance. We did not find, however, a negative impact of FEC 
on the perceived quality, as predicted in [16,17]. Even when most of the packets 
carry voice with FEC, we found that it is better to use FEC than not to use 
it. It might be that for extremely high values of the network load, this will not 
hold, but in that case quality will be be already well below acceptable values, 
FEC or no FEC, so it doesn’t really matter. 

The second strong fact coming from these numerical values is that the qual- 
itative behaviour of perceived quality when load increases is basically the same 
in all cases, with a significant degradation when load approaches 1 and beyond. 

We also found that using a larger larger packetization interval can help im- 
prove the perceived quality. We tried doubling the packet size, obtaining 40ms 
packets (20ms and 40ms packets are commonly used in telephony applications), 
and obtained a slight increase in quality even for higher proportions of audio 
packets (25% of audio packets, which corresponds to the same number of flows 
of a 50%-20ms audio packet proportion). This can be seen in Figure 2. While 
increasing the packetization interval is beneficial in one-way streams, it should 
be studied whether the higher delay values that this creates do not counter these 
benefits. 



Speech quality as a function of network load: pent, 40ms packets. 25.0% of total packets 




Fig. 2. MOS as a function of network load for 25% of voice traffic, with and without 
FEC (40ms packets) 



5 Conclusions 

In this paper we analyze the effect of media-dependent FEC on one-way speech 
streams. To this end, we studied the effects of the increase in network load 
generated by adding FEC to voice streams, using a simple queueing model to 
represent the bottleneck router, in a similar fashion as in[16, 17] . 

In order to estimate the voice quality perceived by the end-user, we used a 
method we have recently proposed, based on Random Neural Networks trained 
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with data obtained from subjective quality assessments. We fed the RNN with 
coding parameters and with packet loss parameters derived from the network 
model, and obtained MOS estimates for each configuration. 

We considered a range of network loads that yielded reasonable loss rates, 
and several values for the fraction of packets corresponding to speech flows. The 
results obtained indicate that, as expected, the use of FEC is always beneficial 
to the perceived quality, provided that the network parameters stay within 
reasonable ranges. Our approach allows to provide actual quantitative estimates 
of this gain, as a function of different parameters. In the paper we focused 
on the network load, but similar analysis can be done to explore many other 
interesting aspects, such as delay/jitter for interactive flows. 

One of the strong points of this approach is the coupling of an accurate tech- 
nique to assess perceived quality (avoiding the use of abstract “utility functions”) 
with a model of the network allowing to obtain information about the loss pro- 
cess. If for some reason the model used yesterday is not considered appropriate 
today, one can move to some other (possibly more detailed) representation of 
the bottleneck (or perhaps to a tandem queueing network corresponding to the 
whole path followed by the packets), and use the same approach to finally map 
traffic parameters to final, end to-end quality as perceived by the final user. 
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Abstract. This paper proposes the architecture of end-to-end QoS for VoIP call 
processing in the MPLS (Multiprotocol Label Switching)-based NGN (Next 
Generation Network). End-to-end QoS for VoIP call processing in the MPLS 
network is guaranteed through the router and VoIP server. However, QoS is not 
presently supported in the VoIP server. In order to resolve this problem, this 
paper proposes QoS resource management and differentiated call processing 
technology by extending VoIP signaling protocol SIP (Session Initiation 
Protocol). QoS resource management coordinates service priority in call 
processing, while differentiated call processing technology processes calls, 
applying the service priority negotiated in the SIP server. Service priority is set 
up through the flow label field of the IPv6 header, considering future MPLS 
label mapping. A performance analysis shows that there is a considerable 
difference in end-to-end call setup delay depending on service priority, in 
setting up SIP calls in the MPLS network. We will be able to ensure excellent 
performance by applying this result to call setups that requires real-time or 
emergency response in the future NGN environment. 



1 Introduction 

With the fast development of network technology, the current network architecture is 
evolving into NGN (Next Generation Network) that allows transfer of broadband 
multimedia to meet the various needs of users. VoIP, which is one of the core 
technology of NGN, requires QoS (Quality of Service) support, such as MPLS 
(Multiprotocol Label Switching) [1] or DiffServ (Differentiated Service) [2], to 
ensure voice service quality comparable to the quality provided by the PSTN (Public 
Switched Telephone Network). 
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VoIP is a technology that accommodates the voice service of PSTN in IP network. 
Compared with PSTN, it has an advantage of providing long distance or overseas call 
services at relatively low costs [3]. VoIP’s core protocols are H.323 and SIP (Session 
Initiation Protocol) signaling protocols [4] [5]. At present, the text-based SIP has been 
adopted as the standard for NGN. It is also expected to be integrated or to provide 
compatibility with the current services such as web, instance messaging, SNMP 
(Simple Network Management Protocol) and SMTP (Simple Mail Transport 
Protocol) services [6] [7]. Therefore, SIP must provide a service quality better than the 
quality provided by the PSTN for call setup in NGN, and offer priority-based call 
processing, depending on the traffic properties of application services. 

The VoIP service in NGN must guarantee call quality for voice data transfer and 
call setup quality for call setup [8], in which the latter must precede the former. Call 
setup is completed before voice data are transferred, and even one bit error or long 
delay results in the failure of call setup, while a partial loss of voice data does not 
cause any serious problem. However, if an error occurs, even in one bit or a long 
delay, the call setup fails and fatal problems may arise. 

MPLS-based NGN guarantees end-to-end QoS, since voice data transferring only 
goes through the router in MPLS network. On the other hand, call setup does not 
guarantee end-to-end QoS, since it goes through MPLS router and multiple SIP 
servers. That is, the routers in the MPLS network guarantee QoS. However, as a SIP 
server that processes call authentication, address transformation, and call routing that 
does not provide QoS for call processing, it does not guarantee end-to-end QoS. 

In order to resolve this problem, this paper proposes the architecture of end-to-end 
QoS for VoIP call processing in an MPLS-based NGN, which supports QoS resource 
management and differentiated call processing technology by extending VoIP 
signaling protocol SIP. QoS resource management coordinates service priority in call 
processing, while differentiated call processing technology processes calls, applying 
the service priority negotiated in the SIP server. Service priority is set up through the 
flow label field of IPv6 header [9], A performance analysis shows that there is a 
considerable difference in end-to-end call setup delay depending on service priority, 
in setting up SIP calls in the MPLS network. 

This paper is composed of six chapters. The second chapter examines relevant 
research, and the third chapter explores system design. The fourth chapter explains 
implementation, and the fifth chapter explores performance experiments, and finally, 
the sixth chapter provides a conclusion and comments on future research tasks. 



2 QoS Architecture for VoIP in the NGN [8] 

The VoIP service in NGN must guarantee end-to-end QoS. VoIP end-to-end QoS is 
classified into call quality and call setup quality. Call quality refers to voice data 
security, speech coding distortion, terminal noise and overall delay including delays 
caused by packetization, buffering, and codecs. The call quality can be guaranteed by 
QoS technology, including MPSL, DiffServ, and IntServ (Integrated Service). On the 
other hand, call setup quality refers to guaranteeing call setup, which is classified into 
call setup delay in the network and call setup delay in VoIP server (SIP server). Call 
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setup delay in the network can be guaranteed through QoS technology, such as MPLS 
and DiffServ. However, there is no standard technology that ensures call setup delay 
in the VoIP server. Therefore, we need various forms of QoS mechanisms in the 
VoIP server to guarantee end-to-end QoS for call setup quality. 



3 Architecture of End-to-End QoS for SIP Call Signaling 

This paper is based on the MPLS-based NGN environment. Fig. 1 shows the 
architecture of end-to-end QoS for VoIP call signaling, proposed in this paper. VoIP 
calls are transferred through the predetermined LSP via MPLS network routers and 
multiple SIP servers. This paper proposes QoS resource management and 
differentiated call processing, by extending SIP protocol, to support QoS in the VoIP 
server. 




Fig. 1 . Architecture of End-to-end QoS for Call Processing in the MPLS Network 
The call processing procedure based on end-to-end QoS is as follows: 

1 . The network manager sets up the LSP (Label Switched Path) to transfer voice and 
signaling data through LDP/CR-LDP protocol. 

2. SIP UA negotiates with the SIP server for the desired service priority, through QoS 
resource management, prior to session set up. In this paper, service priority is 
classified into premium level, assured level, and normal level. 

3. SIP UA transfers session setup messages to the SIP server. Then the SIP server 
over SIP message pass processes messages according to the service priority 
requested by the user, through differentiated call processing technology. 
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As mentioned above, end-to-end QoS for SIP call setup is guaranteed through QoS 
in the MPLS network and differentiated call processing in the SIP server. 



3.1 System Structure 

Fig. 2 shows SIP system structures with extended QoS resource management and 
differentiated call processing technology, as proposed in this paper. This system 
consists of SIP6 Server and User Agent. 
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•SMP : SIP Message Parser *UAC : User Agent Client 

•QRM : QoS Resource Management *UAS : User Agent Server 

•QRR : QoS Resource Requester *SPC : SIP Proxy Server 

•DQoSP : Differentiated QoS Processor *SRS : SIP Redirect Server 

•SLS : SIP Location Server 
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Fig. 2. System structure of SIP Extension 

SIP6d daemon includes the proxy server for call forwarding service, the redirect 
server for call redirect service, and the location server for user location and user 
information registration services. Receiver receives SIP6 message through the IPv6 
UDP Socket interface. Those received messages are divided into Request and 
Response message types and are sent to the SIP message parser. The SIP message 
parser parses SIP messages, and calls SIP server or QoS resource management, 
depending on the method and header type. Differentiated QoS processor applies 
differentiated call processing technology (see Sect. 3.3) to all incoming messages. 
The differentiated call processing technology supports differentiated call processing, 
based on the service priority negotiated in SIP server. The QoS resource management 
(see Sect 3.2) coordinates service priority between SIP UA and SIP server depending 
on the extended method and header in this paper. User database manage various user 
information, including service priority, user-ID, and user location. 

The SIP6 user agent is composed of the UA server/client sending and receiving 
calls in order to establish the sessions, QoS resource requester for priority negotiation, 
and QoS marker that sets up the negotiated service priority for SIP messages. 
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3.2 SIP Message Extension and Flow 

This paper gives extended definitions of SIP method and header, which follow RFC 
2543 message type and processing procedures [10][11], supporting QoS resource 
management and differentiated call processing technology. Table 1 shows the 
extended forms of SIP method and header. 



Table 1 . SIP Method Extension 



Method Type 


Description 


QOSREQUEST 


QOSREQUEST method is used by SIP UA to negotiate 
service priority with the SIP server. It sets up the service 
priority requested by the user through the Qosinfo header 
field. 


QOSWITHDRAW 


QOSWITHDRAW method is used by SIP UA to nullify 
negotiated service priority. It specifies the service 
priority to be nullified through Qosinfo header field. 



Table 2 shows the extended SIP header field in this paper, and each header field is 
included in the method defined in Table 1 and the SIP response message. 



Table 2. SIP Header and Header Option Extension 



Header Type 


Description 


Qosinfo 


Syntax Formalism: Oosinfo:“desired/release”=“ServiceFevel” 


“Desired” header option is used to set up the service priority 
requested by the user in QOSREQUEST method. It is also used 
to set up the negotiated service priority in 200 OK response 
messages. “Release” header option is used to release the service 
priority negotiated by the user in QOSWITHDRAW method 
and 200 OK response messages. 


Qosmark 


Syntax formalism: Oosmark : “ok / no” 

This is included in request/response message transferred to SIP 
message pass. If the header option in Qosmark header field is 
“Ok,” the SIP server applies differentiated call processing 
technology. If it is “No,” the SIP server doesn’t. 



Fig. 3 shows an SIP message processing procedure extended in this paper. SIP UA 
negotiates service priority with the SIP server through QOSREQUEST message prior 
to session set up. The SIP server has the best available service priority. Using this 
information, it authenticates and assigns the service priority requested by the user. 
After negotiating service priority, SIP UA sets up the session beginning from 
INVITE message by setting up the service priority in the flow label field of the IPv6 
header. In this process, differentiated call processing technology is applied. After all 
voice calls are finished and the session ends, SIP UA can nullify the service priority 
through QOSWITHDRAW message. 
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Fig. 3. SIP Extension Message Flow 



3.3 Differentiated Call Processing Algorithm 

The differentiated call processing technology proposed in this paper is implemented 
through priority scheduling technology in application level. Fig 4 shows the 
differentiated call processing algorithm. 




Fig. 4. Differentiated Call Processing Algorithm 










50 C. Kim et al. 



The received SIP messages are stored in message processing buffer for processing. 
The Admission controller determines whether to classify them through Qosmark 
header field. If there is no Qosmark header option, it performs the priority 
authentication procedure negotiated with the user. Next, it inserts the Qosmark header 
field in the SIP messages. The Classifier classifies SIP messages processed by the 
admission controller into three buffers, according to the value set up in the flow label 
field of the IPv6 header. The scheduler accesses these three buffers and arranges SIP 
messages in an excution buffer in the order of priority. The SIP messages arranged in 
execution buffer are consecutively read by the SIP message parser to begin SIP 
transaction. 



4 Implementation 

We referred to the SIP source code from Columbia University. SIP6d is implemented 
using C language in Linux kernel 2.2.x system environment that supports IPv6. The 
GUI environment of SIP UA is implemented using Tcl/Tk language, and the 
controller is implemented using C++. MySQL server is used to manage user 
information, and major modules are implemented using POSIX thread technology, 
considering the simultaneous processing rate of messages. 



5 Performance Analysis 

This paper conducted an end-to-end call setup delay experiment for SIP call 
processing, by establishing an MPLS test network. 

Fig. 5 shows the experiment environment. It is composed of two Linux servers 
with SIP6d, two PCs with test programs, and two Linux servers, which are used as 
routers with MPLS modules based on the software provided by Sourceforge.net. 



SIP Signaling Data 




The experiment has adopted the following procedure. First, the test client program 
generate equal number of three different INVITE messages and simultaneously 
transfers to SIP6d in the number of 50, 100, 150, 200, 250, and 300 messages of each 
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priority. The INVITE messages are sent to the test server program through two MPLS 
routers and two SIP6ds. Secondly, the two SIP6ds process the received messages, 
using differentiated call processing technology and transfer them to the test client 
server. Each message is moved along the LSP path predetermined in the three routers. 
Next, the test server program transfers 200OK response messages for INVITE 
messages. Then, the test client program receives 200OK. response messages coming 
through two SIP6ds, and measures session setup time, i.e. the average end-to-end call 
setup delay time. For the comparison, INVITE messages without priority are 
generated and sent to the SIP6d without a differentiated call processing function in 
the number of 150, 300, 450, 600, 750, and 900. 

Fig. 6 shows the results of an end-to-end call setup delay experiment in SIP6d that 
applied the differentiated call processing technology and an end-to-end call setup 
delay experiment in SIP6d that did not apply the technology over MPLS Network. 




Message Number 

(A) End-to-end Call Setup Delay in SIP6d supporting Differentiated Call Processing 
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(B) End-to-end Call Setup Delay in SIP6d 
Fig. 6. End-to-End Call Setup Delay in SIP6d over MPLS Network 
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Fig. 7 shows the results of an end-to-end call setup delay experiment in SIP6d over 
Non-MPLS Network. The INVITE messages are sent to the test server program 
through three IPv6 routers that use static routing and two SIP6ds. 



Time(sec) 




(A) End-to-end Call Setup Delay in SIP6d supporting Differentiated Call Processing 
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(B) End-to-end Call Setup Delay in SIP6d 
Fig. 7. End-to-End Call Setup Delay in SIP6d over Non-MPLS Network 

As one can see from the graph, the SIP6d that supports differentiated call 
processing technology shows a difference in call setup delay when processing 
messages. In particular, INVITE messages with premium priority have very short call 
setup delay. Therefore, we can see that INVITE messages with higher service priority 
have far shorter call setup delay than those with lower service priority. However, 
SIP6d that does not support differentiated call processing technology has no 
difference in call setup delay. Also, End-to-end call setup delay over MPLS Network 
show better performance than end-to-end call setup delay over Non-MPLS Network. 

The results of the experiments prove that the SIP6d proposed in this paper can 
guarantee end-to-end QoS through a differentiated message processing policy when 
calls are set up in the MPLS network. They also prove that it can provide excellent 
performance in call setups requiring real time or service priority when providing 
voice services in the future NGN based on MPLS. 
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6 Conclusion 

This paper proposes an architecture that can guarantee end-to-end QoS when VoIP 
calls are set up in the MPLS-based NGN, through QoS resource management and 
differentiated call processing technology, by extending SIP protocol. 

The differentiated call processing technology proposed in this paper reserves 
resources by extending SIP, and minimizes end-to-end call setup delay for specific 
calls by using priority scheduling technology in the application level. It also has an 
advantage of setting up the service priority through the flow label field of IPv6 header, 
considering future MPLS label mapping. A performance analysis has showed that, 
unlike existing SIP servers, SIP6d provides a very fast processing rate for messages 
with high service priority, through differentiated QoS function in call processing. 
Also, in the MPLS experiment environment, it has been proven that calls with high 
service priority have very short end-to-end call setup delay. These results prove that 
we can provide excellent performance for call setups that require real-time or service 
priority when providing future voice service in the NGN based on MPLS. 

As a follow-up research task, we perform research on SIP extension MPLS UNI 
protocol, which can reserve and manage QoS resources through SIP protocol in the 
MPLS network. 
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Abstract. In this paper we present an analytical method to study the distribution 
of the backoff delay in an 802.11 DCF WLAN under saturation conditions. We 
show that, with our method, the probability that the delay is below a given threshold 
can be computed accurately and efficiently. We also discuss how our analysis can 
be used to perform admission control on the number of accepted stations in the 
WLAN in order to provide delay assurances to real-time applications. 



1 Introduction 

As 802.11 WLANs see their capacity increased (from the traditional 2 Mbps channel 
capacity to 11 Mbps in 802.11b and 54 Mbps in 802.11a), these networks become 
better suited for the transport of real-time traffic. Since the performance of real-time 
applications is largely dependent on delay, there arises the need for an analysis of the 
delay in this type of networks. 

To the date, the analysis of the delay in 802. 1 1 WLAN has received some atten- 
tion. The analyses of [1,2,3] are limited to the average delay, which is insufficient to 
assess the performance of real-time applications, as these applications require not only 
a low average delay but a low delay for all (or most of) their packets. The analyses of 
[4,5] overcome this limitation by introducing probability generating functions (pgf’s), 
which allow the computation of the probability distribution function (pdf) of the delay. 
However, computing pdf values with this method is very costly computationally and 
hence the approaches of [4,5] are of little practical use to perform e.g. admission control 
functionality. This paper presents an original method to compute the delay distribution 
of 802.1 1 DCF that, in contrast to the previous analyses, is both accurate and efficient. 

The analysis of the delay in this paper focuses on the backoff component of the 
delay under saturation conditions, hereafter referred to with saturation delay. By backoff 
delay we understand the time elapsed since a packet starts its backoff process until it is 
successfully transmitted 1 . This is one of the main components of the end-to-end delay. 
With saturation conditions we mean that all the stations in the WLAN always have 
packets to transmit. Note that assuming saturation conditions corresponds to the worst 
case and thus provides us with an upper bound on the backoff delay. 

* This work has been performed in the context of the Daidalos European IP. 

1 In case the packet is discarded, we consider its backoff delay equal to oo. 
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The rest of the paper is structured as follows. In Section 2 we present a brief overview 
of the 802. 1 1 DCF protocol. In Section 3 we propose a method to analyze the distribution 
of the saturation delay. In Sections 4 we evaluate the performance (namely, accuracy and 
computational efficiency) of the method proposed. The results obtained show that, with 
our method, the probability that the delay falls below a certain value can be computed 
accurately and efficiently. In Section 5 we discuss how our algorithm to compute the 
saturation delay distribution can be used to perform admission control in a WLAN with 
real-time traffic in order to provide this traffic type with end-to-end delay guarantees. 
Finally, in Section 6 we present our concluding remarks. 

2 802.11 DCF 

The DCF access method of the IEEE 802.11 standard [6] is based on the CSMA/CA 
protocol. A station with a new packet to transmit senses the channel and, if it remains 
free for a DIFS time, it transmits. If the channel is sensed busy, the station waits until the 
channel becomes idle for a DIFS time, after which it starts a backoff process. Specifically, 
it generates a random backoff time before transmitting. 

The backoff time is chosen from a uniform distribution in the range (0, CW — 1), 
where the CW value is called Contention Window, and depends on the number of 
transmissions failed for the packet. At the first transmission attempt, CW is set equal to 
a value CW rn .; n , and it is doubled after each unsuccessful transmission, up to a maximum 
Value CWmax- 

The backoff time is decremented once every time interval T e for which the channel is 
detected empty, ’’frozen” when a transmission is detected on the channel, and reactivated 
when the channel is sensed empty again for a DIFS time (if the transmission is detected 
as successful) or an EIFS time (if it is detected as unsuccessful). The station transmits 
when the backoff time reaches zero. 

If the packet is correctly received, the receiving station sends an ACK frame after a 
SIFS time. If the ACK frame is not received within an ACK Timeout time, a collision is 
assumed to have occurred and the packet transmission is rescheduled according to the 
given backoff rules. If the number of retransmissions reaches a predefined Retry Limit, 
the packet is discarded. Upon completing the transmission (either with a success or with 
a discard), the transmitting station resets the CW to its initial value and starts a new 
backoff process; before this ends, a new packet cannot be transmitted. 

The use of the Request to Send (RTS) / Clear to Send (CTS) mechanism is optional 
in 802. 1 1 . When this option is applied, upon the backoff counter reaching zero, the 
transmitting station sends an RTS frame to the receiving station, which responds with a 
CTS frame. The packet is then sent when the transmitting station receives the CTS. 

3 Saturation Delay Analysis 

In this section we propose an analytical model to compute the distribution of the satura- 
tion delay. We first analyze the simplified case in which all packets have the same fixed 
length and the RTS/CTS mechanism is not used, and then propose two extensions of the 
basic analysis to account for these cases. 
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3.1 Basic Analysis 

Let us consider a WLAN with N stations operating under saturation conditions and 
sending packets of a fixed packet length l. Our objective is to compute the probability 
that, under these conditions, a packet transmission of a tagged station experiences a 
saturation delay smaller than a given value D. We denote this probability by P(d < D). 

Fig. 1 illustrates the different components of the saturation delay. Applying the 
theorem of the total probability, P[d < D) can be decomposed as follows 

R 

P(d < D) = ^ P(d < D/i col)P(* col) (1) 

2=0 

where P(i col) represents the probability that a packet suffers i collisions before being 
successfully transmitted and R is the Retry Limit. 




unif(0,CW 0 ) 
slot times 



/■ y 

unif(0,CW 1 ) 
slot times 



/■ y 

unif(0,CW.) 
slot times 



Fig. 1 . Saturation delay. 



Let us define a slot time as the time interval between two consecutive backoff time 
decrements of the tagged station. Note that, according to this definition, a slot time may 
be either empty or contain the transmission of one or more stations. Applying to the 
previous equation the theorem of the total probability on the total number of slot times 
the tagged station counts down before transmitting successfully, we have 

R Wi 

P(d <D) = EE P(d < D/i col, j slots )P(j slots/* col)P(* col) (2) 

2 = 0 j= 0 

where W* = J2k=o Cw k - 1, with CW k = min(2 k CW min , CW max ), and P(j slots/ 
i col) is the probability that the sum of the * + 1 backoff times of the packet equals j, 

P(j slots/* col) — P i E un */( 0) CW k — 1) = j j (3) 

\fc=o / 

where unif( 0, C) represents a discrete random variable uniformly distributed on {0, 1, 

...,C}. 

As the probability mass function (pmf) of a sum of discrete random variables is equal 
to the convolution of the individual pmf’s, we can compute P(j slots/* col) as follows 



P(j slots/* col) = (/o*./i*...* fi)j 



(4) 
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being f}. the pmf of unif( 0, CWk — 1). We compute the above convolution with Fast 
Fourier Transforms (FFT’s), as FFT provides a very efficient means of computing con- 
volutions. 

Let t be the probability that a station transmits in a slot time in a WLAN with N 
stations under saturation conditions. Following the analysis of [7], we compute r by 
solving the non-linear equation resulting from the following two equations: 

P = 1 - (1 - r)^- 1 (5) 



and 

2(1 — 2p)(l — p R+1 ) 

T ~ W{ 1 - (2p) m+1 )(l — p) + (1 - 2p)[(l -p R+1 ) + W2 m p m + 1 (l -p R ~ m )\ 

( 6 ) 

where R is the Retry Limit, W = CW m i n + 1, m is such that CW max = 2 m CW m i n 
and p is the probability that a transmission attempt collides. 

The first approximation upon which we base our analysis is the same as [8]: we 
assume that a station other than the tagged station transmits at each slot time with a 
constant and independent probability r. With this assumption, the probability that the 
tagged station suffers i collisions before transmitting successfully can be computed 
according to 

P(i col) = P' c Ps = (1 - (1 - r)^ 1 )* (1 - t) n ~ x (7) 



where P s corresponds to the probability that a transmission of the tagged station is 
successful (i.e. none of the other N — 1 stations transmits) and P c to the probability that 
it collides (i.e. some other station transmits). 

Our second approximation 2 is to assume that the saturation delay given i collisions 
and j slot times is a gaussian random variable, which we denote by dij . Note that, assum- 
ing independence between different slot times (which is given by the first approximation) 
and a number of slot times large enough (which is the typical case), the Central Limit 
Theorem assures that this approximation is accurate. 

With the above approximation, it is enough to know the average and the typical 
deviation of dij (which we denote by and cr.y, respectively) to compute P(d < 
D ji col, j slots), 



P(d < D ji col, j slots) 



»- 5 +»- 5e ’-/(vS i ) 



D—rriij 

a ij 

D—rriij 

a ij 



> o 

< 0 



( 8 ) 



Given the assumption of independence between different slot times, nijj can be 
computed as the sum of the average duration all slot times in d l3 , 



rriij = j m n + iT c + T s (9) 

2 This approximation is the key difference between our model and the analyses of [4,5]; while, 
with our approximation, we only need to compute the average and typical deviation values of dij, 
which can be done efficiently, [4,5] compute all the possible values of dij and their probability, 
which, as dij can take a very large number of different values, is very costly computationally. 
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where m n is the average duration of a slot time in which the tagged station does not 
transmit, T c is the duration of a slot time that contains a collisions and T s is the duration 
of a slot time that contains a successful transmission. 

The duration of a slot time that contains a successful transmission is equal to [9] 

TT I 7 AC 1 K 

T s = T plcp + - + SIFS + — - + DIFS ( 10 ) 

where Tplcp is the PLCP (Physical Layer Convergence Protocol) preamble and header 
transmission time, H is the MAC overhead (header and FCS), ACK is the length of an 
ACK frame and C is the channel bit rate. 

Similarly, the duration of a slot time that contains a collision is equal to 

T c = Tplcp + + EIFS (11) 

The average duration of a slot time in which the tagged station does not transmit, 
m n , is computed as 

m n = Pa,n Ts + Pc,n T c + P e ,n T e ( 12 ) 

where P S)H represents the probability that a slot time in which the tagged station does 
not transmit contains a successful transmission, P c/n the probability that it contains a 
collision and / (, ,, the probability that it is empty. 

P sn , Pe,n and P Cj „ can be computed from r and N as 

P s ,n = (N - 1 ) t (1 — t ) n ~ 2 , P e , n = (1 - r )^ 1 ( 13 ) 

and 

Pc,Ti = 1 — Pa,n — Pe.n ( 14 ) 

With the assumption of independence between different slot times, the typical devi- 
ation (Tij can be computed from 

( 15 ) 

with 

a n = Ps,n Tg + P c , n T c T Pe,n Tg — 171^ ( 16 ) 

which closes the analysis. 



3.2 RTS/CTS 

In case the RTS/CTS option is used, successful packets are preceded by a RTS/CTS 
exchange, while collisions occur with RTS frames instead of data packets. Accordingly, 
the durations of the slot times containing a successful transmission and a collision have 
to be computed as in [9] for the RTS/CTS case. With this only modification, the analysis 
of the previous clause can be used to compute the saturation delay distribution for the 
RTS/CTS case. 
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3.3 Non Fixed Packet Lengths 

Next, we extend our basic model to the case when packet lengths are not fixed but follow 
a certain distribution. Specifically, we consider that a packet length takes a value l of the 
set L with probability Pi, being L the set of all possible packet lengths. For simplicity, 
we assume that all stations transmit the same packet length distribution; however, the 
analysis would be very similar in the case when this condition does not hold. 

In order to account for non fixed packet lengths, we have to modify the expressions 
to obtain the m.y and values, rn-ij is computed as 

mij = j m n + im. c + m s (17) 

where m n is the average duration of a slot time in which the tagged station does not 
transmit, m c is the average duration of a slot time in which the tagged station collides 
and m s is the average duration of a slot time in which the tagged station transmits a 
packet successfully. 

The average duration of a slot time in which the tagged station does not transmit, 
m n , is computed as 

mn = ^ 1 Ps,l,n T S J + ^ ' P c ,l,n P C ,l + Pe,n T e (18) 

l£L IGL 

where P s j represents the probability that a slot time in which the tagged station does not 
transmit contains a successful transmission of a packet of length l, P C) i^ n the probability 
that it contains a collision with the longest packet involved of length l and T s> ; and T c i 
are the slot time durations in each case. 

P s ,i, n and P c ,i, n are computed as 

P s ,l,n = (N-l)T(l-T) N ~ 2 P l and P c ,l, n = (1 - Ps,l,n - Pe,n) Pc,l (19) 



where P c i is the probability that the longest packet involved in a collision is of length 
l. Neglecting the collisions of more than two stations, 



P c ,i = 2 pJ^Pk- Pi 

keLi 



(20) 



where /,/ is the set of all the packet lengths smaller than or equal to l. 

The duration of a slot time that contains a successful transmission of a packet of 
length l, T s j, and the duration of a slot time that contains a collision of two packets, the 
longest of length l, T r j, can be computed following Eqs. (10) and (11). 

Finally, the typical deviation <Jij for the non fixed packet length case can be computed 
from 

°ij = J ^ ( 21 ) 



with 



= 5> 

leL 
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'■> S,l 
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rp 2 

^ C.l 



+ P a n T; - ml 
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4 Performance Evaluation 

Next, we evaluate the accuracy and computational efficiency of the model proposed. The 
values of the system parameters used to obtain the results, both for the analytical model 
and the simulation runs, have been taken from the 802.1 lb physical layer. The packet 
length has been taken equal to 1000 bytes for the fixed packet length case, and derived 
from the measurements of Internet traffic presented in [10] for the non-fixed packet 
length case. Simulations are performed with an event-driven simulator developed by us, 
that closely follows the 802. 1 1 DCF protocol details for each independently transmitting 
station. 

Figs. 2, 3 and 4 illustrate the cumulative distribution function (cdf) of the saturation 
delay -i.e. P(d < D) as a function of D— for our basic model, RTS/CTS extension and 
non-fixed packet lengths extension, respectively. Analytical results are represented with 
lines and simulations with points. Simulation results are given with a 95% confidence 
interval below 0.1%. Results show that our analysis is very accurate; in all cases, and 
for all values of D and N, simulations coincide almost exactly with analytical results. In 
addition, results corroborate the intuition that delays are smaller for the RTS/CTS and 
non-fixed packet lengths cases (the latter due to smaller packets being transmitted). 

In order to evaluate the computational efficiency of our method, we measured the 
times required to compute the cdf values given in Figs. 2, 3 and 4. Measurements have 
been taken in a Pentium 4 PC with 2.66 GHz of CPU speed and 192 MB of RAM, 
running under the Linux operating system. We obtained that, for all models (basic, 
RTS/CTS and non fixed packet lengths) and different values of N (2, 10, 30 and 100), 
the time required to compute the 20 cdf values given in each of the graphs, ranged 
from 0.37 to 0.45 seconds. These results show that, with the model proposed, the times 
required to compute the P(d < D) values keep very low (in all cases below 0.5 seconds 
for 20 points) and, moreover, are practically constant (almost independent of the model 
and N). We believe that these results, even though taken in a single platform and running 
not necessarily optimized code, do proof the low computational cost of our algorithm. 
Note that the times measured (in the order of 0.5 seconds) are fully acceptable to take an 
admission control decision; moreover, as (following the discussion of the next section) 
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Fig. 2. Saturation delay cdf: Basic Model. 
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Fig. 3. Saturation delay cdf: RTS/CTS. 




D (ms) 

Fig. 4. Saturation delay cdf: Non fixed packet lengths. 

in some situations one P(d < D) value may be enough for admission control, the time 
involved in taking an admission control decision may even be much smaller. 

5 Discussion on End-to-End Delay Guarantees in WLANs 

The method we have proposed in this paper allows computing the distribution of the back- 
off delay under saturation conditions. The backoff delay is one of the main components 
of the end-to-end delay, but not necessarily the only one. Real-time applications require 
end-to-end delay (i.e. the sum of all the delay components) to be below a certain thresh- 
old (at least for most of the packets), or otherwise their performance is unsatisfactory. In 
this section we discuss how our method can be used to derive the worst-case distribution 
of the end-to-end delay, and thus allow providing end-to-end delay guarantees by means 
of admission control. 

The fact that our model assumes saturation conditions represents the worst possible 
case for a tagged station, as this station will experience the largest delays when all the 
other stations have always packets to transmit. Therefore, this is the case that should be 
considered if our goal is to provide end-to-end delay guarantees by limiting the number 
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of stations in the WLAN by performing admission control. Many of the previous delay 
analyses of DCF (namely, [1,2,3]) also assume saturation conditions. 

If we consider an end-to-end communication between two WLAN stations, or a 
WLAN station and the Access Point, then the end-to-end delay consists of two main 
components: the backoff and the queuing delays. The first is the time elapsed since a 
packet starts its backoff process until it is successfully transmitted, while the second is 
the time elapsed since the generation of a packet until it reaches the first position of the 
transmission buffer. The backoff component of the delay is accurately characterized in 
the present paper. An open issue is the computation the queuing delay. 

The problem of computing the queuing delay in the above case can be seen as 
analyzing a classical G/G/l queue, in which the arrivals follow the process given by the 
packet arrivals at the station, and the queue service time follows the distribution of the 
backoff delay (which has been characterized in this paper). This problem can be dealt 
with classical queuing theory [1 1] - this is the approach taken by [4,5]. 

The 802. 1 1 standard allows that a station, once it gets access to the channel, sends 
not only one but multiple packets separated by SIFS times. This option is appropriate 
e.g. for voice sources, because of the stringent delay requirements of their packets, and 
also because the short length of voice packets would make the protocol overhead very 
high otherwise. For a tagged station using this option, and sending all the packets waiting 
for transmission in its buffer every time it gets access to the channel, the end-to-end delay 
consists of the backoff delay only, and therefore the model presented in this paper can 
be used to characterize the end-to-end delay. 



6 Summary and Final Remarks 



As the capacity of WLANs and their use by real-time applications increases, there arises 
the need for better understanding and predicting the delay behavior in this type of net- 
works. In this paper we have proposed a method to compute accurately and efficiently 
the distribution of the backoff delay in 802.11 DCF under saturation conditions. The 
method proposed is a first step towards an admission control algorithm that, by limiting 
the number of stations in the WLAN, ensures end-to-end delays low enough for real-time 
applications. 

The backoff delay experienced by a station can be interpreted as the service time seen 
by its internal queue. Then, classical queuing theory can be used to derive the queuing 
delay, given the characterization of the backoff delay obtained in this paper. If a station 
sends all its waiting packets when it accesses the channel, the backoff delay derived here 
is the only component of the end-to-end delay. 

Our model to analyze the backoff delay of a tagged station assumes that all other 
stations always have packets to transmit. As this corresponds to the worst case for 
the delay of the tagged station, the results obtained represent an upper bound and are 
therefore appropriate for providing the tagged station with delay guarantees. However, 
our analysis could also be reused for non-saturation conditions, if the r probabilities 
under non-saturation conditions were given (a rough approximation to compute them is 
proposed in [4]). 
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In the literature, there have been many protocol proposals for WLAN that, unlike 
DCF, have been designed specifically to satisfy the delay requirements of real-time 
applications (see e.g. [12,13,14]). The PCF scheme of 802.11 [6] was also designed 
with a similar intention. However, none of these (including PCF) is widely deployed 
today, which leaves DCF as the only option to provide real-time traffic communication 
in today’s WLANs. 

The IEEE 802.11 WG is currently undergoing a standardization activity to extend 
the 802.11 protocol with QoS support, leading to the upcoming 802. lie standard. The 
EDCA access mechanism of 802.1 le is an extension of the DCF protocol. We believe 
that our analysis here provides a basis that can be extended to analyze the delay of 
802. lie EDCA. 
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Abstract. Although the performance analysis of the IEEE 802.11 Distributed 
Coordination Function (DCF) in saturation state has been extensively studied in 
the literature, little work is present on performance analysis in non-saturation 
state. In this paper, a simple model is proposed to analyze the performance of 
IEEE 802.11 DCF with service differentiation support in non-saturation states, 
which helps to obtain a deeper insight into the IEEE 802.1 1 DCF. Based on the 
proposed model, we can approximately evaluate the most important system 
performance measures, such as packet delays, which provide one with an 
important tool to predict and optimize the system performance. Moreover, a 
practical method to meet packet delay requirements is presented based on our 
theoretical results. Comparisons with simulations show that this method 
achieves the specified packet delay requirements with good accuracy. 

Keywords: Wireless LAN, IEEE 802.11, Quality of Service Guarantee, Service 
Differentiation 



1 Introduction 

In recent years, IEEE 802.11 has become one of the most important international 
standards for Wireless Local Area Networks (WLAN’s) [1], In the IEEE 802.11 
protocol, the fundamental mechanism to access the medium is the Distributed 
Coordination Function (DCF), which is a random access scheme based on the carrier 
sense multiple access with collision avoidance (CSMA/CA) protocol. Many 
performance analyses of 802.11 have been proposed, such as those in [2]-[5]. 
However, the previous papers consider the assumption of saturation state. That is, it is 
assumed that the transmission queue for each station is always nonempty, which is 
not realistic in real-world systems. In [6] and [7], more practical queuing models for 
IEEE 802.11 DCF are proposed which incorporate practical packet arrival processes. 
However, the service rate for each node is still based on the results obtained in [4], 
where saturation state is assumed. The limitation is overcome in [8], where 
performance analysis in non-saturation state is considered by introducing probability 
generating functions, which allow the computation of the probability distribution 
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function (pdf) of the delay. However, computing pdf values with the proposed method 
has a high computational cost and therefore the approach is of limited practical use. 
The other drawback is that the complex analysis method in [8] is of little help to 
obtain deeper insight into relationships among different system parameters. Moreover, 
service differentiation support is not considered. In this paper, based on our former 
work in [9]-[10], a simple analysis model is proposed to analyze the performance of 
an enhanced 802.11 DCF with service differentiation support in non-saturation state. 
We considered the following objectives when defining the model. 

1. The analysis model should be simple enough to obtain a clear insight into 
relationships among the most important system parameters. 

2. The analysis model should be as practical as possible, so that it can be 
implemented in real-world systems. 

3. Service differentiation must be considered. 



2 Performance Analysis 

We consider a single-hop wireless LAN, where stations can “hear” each other well. It 
is assumed that the channel conditions are ideal (i.e., no hidden terminals and capture). 
M types of traffic are considered with n i type i (i = ) stations, and, for 

simplicity, each station bears only one traffic flow. If the station is busy on the arrival 
of a packet, the packet must wait in the corresponding transmission queue. The buffer 
size is assumed to be infinite. It is assumed that the packet arrival processes for type 
i traffic flows follow independent and identical distributions (i.i.d.), with mean 
packet inter-arrival duration T i . The model can consider different arrival processes. 
Moreover, it is assumed that all packets have the same payload length, which is 
transmitted in the duration of P L . It is also assumed that a backoff process starts 
immediately when the current packet arrives at the head of the queue. 

In the following, a type i traffic flow is considered. Let b,{t) be the stochastic 
process representing its backoff time counter. Moreover, let us define W t = CVT mmi as 
its minimum contention window. Denote m i , “maximum backoff stage” as the value 
such that CW mni =2 m ‘ ■W i . s^t) is the stochastic process representing its backoff 
stage (0,1 ) . A two-dimensional discrete-time Markov chain (shown in Fig. 1) is 
used to model the behavior of the traffic flow. The states are defined as combinations 
of two integers {$,(*), ^(f)} . It should be noted that apart from using states 
{^(f),& ; ( f)} , a state VTSS (Virtual Time Slot State) is used to model the case that a 
traffic flow has finished sending a packet and is waiting for the next one. In order to 
make the system tractable by using a discrete-event Markov chain, the VTSS is sub- 
divided into different VTS (Virtual Time Slots), whose duration is the same as the 
time slot in the backoff process. We assume that the station checks if there is a packet 
available for transmission only at the end of a VTS. In this way, the behavior of the 
traffic flow in VTSS can be modeled in the same way as the actual backoff processes. 
For clarity, the above approximated version of DCF is called ADCF. This 




66 



B. Li and R. Battiti 




Fig.l. Markov model for a type i traffic flow in ADCF 



approximation has very little influence to the final system performance, as verified by 
extensive simulations. If it is found that the packet transmission queue is not empty 
after sending the current packet, the state of the traffic flow transits from VTSS to 
some backoff state. Otherwise, the traffic flow still needs to wait for the arrival of the 
next packet in VTSS. From Fig. 1, it can be seen that after a packet has been 
successfully sent or the current VTS has finished, the traffic flow steps into another 
VTSS with probability q t . Moreover, parameter p t is referred to as conditional 
collision probability, the probability of a collision seen by a packet belonging to a 
type i traffic flow at the time of its being transmitted on the channel. For simplicity, 
both q i and p t are regarded as constant, which is validated through extensive 
simulations. 

In steady state, d jk (i) = ]imP{ sXt)= j,b:(t) = k} ( i = , ye [0,mj , 

ke [0, 2 J W. -1] ) is the stationary distribution of backoff states of a type i traffic 
flow. P WSSi is defined as the probability for the traffic flow being at VTSS. 
Therefore, based on the Markov chain, we have 

PvTSS.i ~ ^0,0 ' *li /(l — <7; ) (1-1) 

| dj fl a ) = Pi 1 ■ d 00 (0 (0 < j < m,. ) 

\d mi , 0 (i)=prd 0 ' 0 (m-p i ) 
d jk (i) = (2 j W i -k)-d. 0 {i)/2 i W i (1.3) 

T i is the probability that a type i traffic flow transmits in a randomly chosen time 
slot. It can be given as 

ttti 

r,=Y J d j , 0 (i) = d 00 {i)/a-p i ) 

j = o 



( 2 ) 
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m, 2 ’ Wj—l 

Since P vrss ,. + ^ ^ d jk {i) = 1 , combing equations 1 and 2, we have 

7=0 k=Q 

{(1 - 2p, )(W l + 1) + p t W t [1 - (2/7,. )"'■ ]} • t, /[2(1 - 2/7,. )] + P wss . = 1 (3) 

Extensive simulations show that, even if the packet arrival of each traffic flow are 
assumed to be independent, in some cases there are obvious correlations between 
behaviors of different traffic flows. Therefore, by introducing compensation factors 
a,. > 0 (/ = 1 , packet collision rates can be expressed as 



P, 



\ = a i |1 -(1 -r,)"' 1 fj a -Tj) n >] 



(4) 



i=Ui*i 



In non-saturation state, the system total throughput S and throughputs 
5 1 ,. (i = 1 contributed by type i traffic flows can be expressed as follows with 

the assumption that all the arrived packets are finally transmitted successfully 



M M y. D 

i = 1 i = 1 1 p,i 






f M M \ 

p ■ <j ■ n (i - h y + p s ■ £ n i ■ T i ■ a ■ - Pi ) + 

i = 1 /= 1 

M M 

1 1 - ^ • n (1 - )"' - £ n t • r f • (1 - Pl )] • P c 



(5) 



where p > 0 is another compensation factor. It should be noted that the purpose for 



the introduction of ctr,. and p is to make our mathematical expressions more rigorous. 
Extensive experiments show that or,, and p can be approximated as one under the 
case that the system operates in stable states. Moreover, in equation 5, a is the 
duration of an empty time slot (it is also the duration of an empty VTS). P s is the 



average time of a slot because of a successful transmission of a packet. And P c is the 
average time the channel is sensed busy by each station during a packet collision. We 
have: 

Ps = PHY header + MAC header +P L + SIFS + S+ACK + DIFS + S ( 6 ) 

P<: = PHY header + MAC header + P L + DIFS + A ( 7 ) 

where 5 is the propagation delay. 

We assume that behaviors of all the traffic flows are independent (simulations 
show that this assumption approximately holds in the case that minimum contention 
window sizes W t s are not very small). In this case, the above introduced or,, and p 
can be approximated as 1. Therefore, given the corresponding offered traffic load 
(hence, the system throughput S is also given), based on equations 4 and 5, packet 
sending rates r, s and the corresponding packet collision rates />,. s can be determined. 
Although two sets of solutions can be obtained, only one is preferred, which 
corresponds to smaller packet collision rates. We denote the preferred solution as 
0(r, m ,/7j m ). Considering the stability, the system should operate close to this 

solution. It can be seen that 0(r, M , p[ M ) can be completely determined without 
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relying on the measurements of other system parameters, such as, packet collision rate 
Pi s. 



Next, we make an analysis on packet delay T dj (i = ) , which is defined as 

the average duration between the beginning of a backoff procedure and the instant 
that the corresponding packet has been successfully sent. Let us consider a type i 
traffic flow. n VTS i is the average number of successive VTS following the successful 



sending of a packet. It can be given as 

— X -1 • „ ; /I „ \ PvTSS.i PvTSS.i /o\ 

n vrs.i ~ / , J ' li ' (1 li) ~ . ~ , .. — .. ( 8 ) 

j=i 1 — Qi d 00 (t) (1 — p : ) 

Considering the case that P s ~ P c and r,. 4C 1 , the average duration T vrs . of a VTS 



can be approximated as 

M M 

Tyrs.1 « p ■ (J ■ n a ■ - Tj )"' + [1 - p ■ n a ■ - Tj )"' ]P- T,P S 

}= 1 i = 1 

According to equation 5, we have 

T’m,, 



(9) 

( 10 ) 



Therefore, T d j can be approximated as 



Tdj ~ T p j n vTSj ' TyrSj 



Yl P P 

; n i r Ln_p ) + r s .p 

n V A 1 VTSS,i ) ~ 1 VTSS,i 

Si 1 -p. 



( 11 ) 



It should be noted that the above estimated T d t can be approximated as the average 

service time for a packet in its transmission queue. Therefore, it can be directly 
applied to evaluate the average packet queuing delay (waiting time in transmission 
queues) by using G/G/l queuing model [1 1], which is omitted here because of space 
limitation. 



3 Approximation Analysis 

Assume that the system operation point can be approximated as Otr’ M ,p\ M ) - 
Theoretical analysis shows that if the number of traffic flows (i = 1 and the 

packet payload P L is not too small, it is reasonable to assume that T j <SC 1 [10]. From 
equation 5, it can be obtained that 

<•(1 -P i yTpj=T r (l-p i )T p . (12) 

Under the assumption that T* 1 , from equation 4, we have 

P* ~ p] (13) 

Therefore, we can make the following approximation, 

hT pA ~TjT p . 

After substituting OfTj M , p x M ) into equation 3, we have 



(14) 
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I - P„ 



1 -P, 



VTSS,j 






(15) 



Obviously, when the minimum contention window sizes W. s are large, both P wss t 

and p t are small. Therefore, according to equation 11, the packet delay can be 
approximated as 

T, ; = ”^-0-^,,.) (16) 



S 



Based on equations 14, 15 and 16, it can be obtained that 

W, 



T 

1 d,i 

T 

l i,i 



W, 



(17) 



Note that the above approximation only holds in the case that P wss . s are very small. 

Equation 17 are exactly the same as the approximated results given in [9], which 
shows that saturation states can be regarded as an extreme case for non-saturation 



state with P VTSS , = 0 . 



In the following part of this section, we try to find out how to properly set the 
minimum contention window sizes W. s so as to achieve the target packet delay 

requirements, that is, T d i <T di . Combing equations 3 and 1 1, we have 



d,i 



n i P L 

S .. 



n i P L 



1 -Pi S t 



] _ ^-2p,)+P,[\-(2p,P] WT 



(18) 



2(1- 2 Pi) 

As we have already mentioned before, for stability, the system should operate near 
0(r‘ M , p l M ) . Therefore, if it is required that T d f <T di , based on above equation 



W, < 



T - 

1 d,i 



Y 



where y = 



,l i P L 



i -p] 



1 ~Pi A 

a-2f>;)+p;-[i-(2 />;r] 

2(1 — 2 p*) 



(19) 



T t . Equation 19 tells one the 



upper bounds for W t s so as to meet the packet delay requirements T d , s. 



4 Results and Discussions 

In this section, both numerical and simulation results are shown to validate our 
proposed analysis model. In our experiments, the parameters for the system, which 
are based on IEEE 802.11b, are summarized as follows: MAC Header = 272 bits; 
PHY Header = 192 /is ; ACK = 1 1 2bits + PHY Header; Channel Bit Rate = 11Mbps; 
Propagation Delay = 1 fjs ; Slot Time = 20 /Js ; SIFS = 1 0 f/s ; and DIFS = 50 /Js . In 
our discrete-event simulation, a single-hop wireless LAN is considered. In the system, 
there are «j and n 2 type-1 and type-2 sending stations, respectively. Each of them 
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carries only one traffic flow. It is assumed that the channel conditions are ideal (i.e., 
no hidden terminals and capture). 

In the first experiments, two types of traffic flows are considered. Type-1 traffic 
has priority over type-2 traffic. Therefore, a smaller minimum contention window size 
ITj is assigned to type-1 traffic, and a larger minimum contention window size 
W 2 = 5 IT, is allocated to type-2 traffic. Equations 3 and 5 are fundamental in this 
paper, they are validated in Fig. 2 and Fig. 3, respectively. In Fig. 2, the virtual time 
slot rates P vrss i s versus W ] are shown. P vrss f s are obtained in two ways: one is by 

simulations. In the second way, packet collision rates p. s and packet sending rates 
r, s, which are obtained by using simulations, are substituted into equation (3) to 
calculate the corresponding P vrss j s. System parameters are shown in the figure. It can 
be seen that the P wss i s obtained by using equation 3 are very close to the simulated 
values, which validates the Markov model shown in Fig. 1. It can also be seen that 
when the minimum contention window sizes W. s are very small, the differences 
between the simulated values and the estimated ones are larger. This is because in this 
case the packet collision rates increase dramatically and the behavior of the system is 
unstable. 




Minimum Contention Window Sizes W 1 {W=5W^) 

Fig. 2. Virtual time slot rates s t versus VV^ 

In Fig. 3, the throughput S f s are obtained by using two ways: one is by simulation. 
In the second way the throughputs are obtained by substituting p t s and T t s, which 
are obtained from simulations, into equation 5. Again, it can be seen that when W. s 
are very small, equation 5 can not describe the behavior of the system well, which is 
caused by the instability of the system behavior in this case. 

In Fig. 4, packet delays 7’, i s versus W. s are shown. Two ways are used to obtain 

T d i s. One is through simulations. The other way is that we estimate 7’, ; s based on 
equation 1 1 . In order to use equation 1 1 , the solution 0(r, ,r 2 , p 1 , p \ ) is calculated by 
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Fig. 3. Throughput S t versus W l 




Minimum Contention Window Sizes lV f {W=5W^) 

Fig. 4. Packet delays T d ; versus W i 

using equation 5 with the assumption that a l and p are equal to 1 . Then the obtained 
solution is substituted into equation 1 1 . From the figure, it can be seen that when W. s 
are not very small, packet delays can be successfully estimated. In this case, packet 
delays decrease linearly with the decrease of W. s (we say that the system operates in 
“Stable State”). In this case, with the decrease of W. s, time wasted in backoff 
processes can be directly converted into VTS without causing significant increase in 
packet collision rates and packet sending rates. When W. s are very small, the 
behavior of the system is unstable (packet collision rates and packet sending rates 
increase drastically with the slight decrease of W t s). Therefore, packet delays tend to 
increase drastically. In this case, the estimations of packet delays are not accurate. 
However, equation 1 1 is useful, because one does not want the system to operate far 
from the “Stable State”. However, an interesting future research topic is to guarantee 
that the system operates under the “Stable State”. 
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Table 1 . Guarantee Packet Delay Requirements 





■9 




w 2 


hj 


r d , 2 




0.030 


109 


924 


0.005052 


0.030827 




0.025 


109 


759 


0.005085 


0.025859 




0.020 


109 


594 


0.005126 


0.020966 




0.015 


109 


429 


0.005204 


0.016293 




0.010 


109 


264 


0.005313 


0.011022 




0.005 


109 


99 


0.005504 


0.005506 



System parameters: Plc.i = Pu n .2 = 2000 bytes, iii = 5, n 2 = 10, mi=m 2 =7, Tm = 
0,020363636 s, Tr, = 0.10181818 s 



In Table 1, we demonstrate a possible application for our analysis model. In 
equation 19, we propose a way to estimate the upper bounds for the minimum 
contention window sizes W t s to meet the required packet delays T d ; <T di . In this 
example, we first estimate the upper bounds for W i s based on equation 13. Then, to 
better understand the performance of the estimated upper bounds, actual packet delays 
T d f s are obtained from simulations with the corresponding W. s being set to be equal 
to the corresponding estimated upper bounds. Finally, comparisons can be easily 
made by comparing the obtained packet delays T d ; s and the required packet delays 

T di s. In Table 1, the first two columns are the packet delay requirements. The third 

and fourth columns are estimated minimum contention window sizes by using 
equation 19. The last two columns are the achieved packet delays obtained from 
simulations. It can be seen that the packet delay requirements can be approximately 
met, which suggests a promising application for our proposed model. 



5 Conclusions 

In this paper, a simple model has been proposed to analyze the performance of IEEE 
802. 1 1 DCF with service differentiation support in non-saturation states, which helps 
one to obtain deeper insight into the IEEE 802.11 DCF. Under the case that the 
system operates in stable states, we can approximately evaluate the most important 
system performance measures, such as packet delays, which provide one with an 
important tool to predict the system performance. Moreover, in order to meet certain 
packet delay requirements, a practical method has been given based on our theoretical 
results. Comparisons with simulation results show that this method does achieve the 
specified packet delay requirements with good accuracy. Possible extensions of this 
work to consider practical schemes capable to rapidly adapt to changing traffic loads 
are now being considered. 
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Abstract. We propose a new protocol, named DS-SWAN (Differentiated Ser- 
vices-Stateless Wireless Ad Hoc Networks), to support end-to-end QoS in ad 
hoc networks connected to one fixed DiffServ domain. DS-SWAN warns nodes 
in the ad hoc network when congestion is excessive for the correct functioning 
of real-time applications. These nodes react by slowing down best-effort traffic. 
Furthermore, we present a routing protocol for the ad hoc network, named SD- 
AODV (Service Differentiation- Ad Hoc On-demand Distance Vector), where 
new route requests are suppressed to maintain the desired QoS requirements for 
real-time flows. Simulation results indicate that DS-SWAN and SD-AODV sig- 
nificantly improve end-to-end delays for real-time flows without starvation of 
background traffic. 



1 Introduction 

There has been little research on the support of QoS when a wireless ad hoc network 
is attached to a fixed IP network. In this context, co-operation between the ad hoc 
network and the fixed network can facilitate the end-to-end QoS support [1]. In the 
present work, we propose a new protocol, named DS-SWAN, that is based on the co- 
operation between a QoS model named SWAN within the ad hoc network and a Dif- 
ferentiated Services (DiffServ) [2] domain in the fixed network. 

The authors in [3] study the behavior of voice traffic in an isolated ad hoc network 
that uses SWAN. However, there has been no prior work on analyzing the transmis- 
sion of real-time traffic that shares resources with background traffic between a mo- 
bile ad hoc network and a fixed IP network. 

The routing protocols play an important role in support of delivering QoS because 
the network performance relies on the speed at which routing protocols can recom- 
pute new routes between source-destination pairs after topology changes. Therefore 
in the present work we present a new routing protocol for the ad hoc network, named 
SD-AODV, that interoperates with the QoS scheme (DS-SWAN) to maintain the de- 
sired QoS between the wired and the wireless network. 



J. Sole-Pareta et al. (Eds.): QofIS 2004, LNCS 3266, pp. 74-83, 2004. 
© Springer- Verlag Berlin Heidelberg 2004 
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The paper is structured as follows: Section 2 describes the SWAN model. Section 3 
presents the new protocol, named DS-SWAN (DiffServ-SWAN). Section 4 intro- 
duces the new routing protocol SD-AODV. Section 5 explores the dynamics of the 
system. Finally, Section 6 concludes the paper. 



2 SWAN 

SWAN is a stateless scheme designed to provide end-to-end service differentiation in 
ad hoc networks employing a best-effort distributed wireless MAC [3], It distin- 
guishes between two traffic classes: real time and best effort. 

When best-effort packets arrive at a node, they enter a leaky-bucket traffic shaper 
that has a previously calculated rate, derived from an AIMD (Additive Increase 
Multiplicative Decrease) rate control algorithm. Every node measures the MAC de- 
lays continuously and this information is used as feedback to the rate controller. 
Every T seconds, each device increases its transmission rate gradually (additive in- 
crease with increment rate of c. bit/s) until the packet delays at the MAC layer become 
excessive. As soon as the rate controller detects excessive delays, it reduces the rate 
of the shaper with a decrement rate (multiplicative decrease of r %). 

Rate control restricts the bandwidth of best-effort traffic so that real-time applica- 
tions can use the required bandwidth. On the other hand, the bandwidth not used by 
real-time applications can be efficiently used by best-effort traffic. 

For the real-time traffic, SWAN uses sender-based admission control. This mecha- 
nism works by sending an end-to-end request/response probe along the existing route 
to estimate the bandwidth availability at each node and then determines whether a 
new real-time session should be admitted or not. 



3 DS-SWAN 

We consider a scenario where background traffic and real-time VBR (Variable Bit 
Rate) traffic are transmitted as the mobile nodes in the ad hoc network communicate 
with one of the fixed hosts located in the fixed network (see Fig. 2). For the sake of 
simplicity, we include a single DiffServ domain and we assume that the traffic goes 
from the ad hoc network to corresponding nodes in the fixed network. 

For the real-time traffic, the DiffServ service class is the EF (Expedited Forward- 
ing) PHB (Per-Hop Behavior), which provides low loss, low latency, low jitter and 
end-to-end assured bandwidth service. The EF aggregates are policed with a token 
bucket at the ingress edge router. The traffic that exceeds the profile is dropped. 

The number of dropped packets at the ingress edge router and the end-to-end delay 
of the real-time connections are associated with the QoS parameters of the SWAN 
model in the ad hoc network. We observe that if the rate of the best-effort leaky 
bucket traffic shaper is lower then best-effort traffic is more efficiently rate controlled 
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and real-time traffic is not so much influenced by best-effort traffic and it is able to 
maintain the required QoS parameters. We propose a new protocol that enables the 
co-operation between the DiffServ architecture at the fixed network and the explained 
SWAN scheme in the ad hoc network to improve end-to-end QoS support. In the pro- 
posed protocol, DS-SWAN, the ingress edge router periodically monitors the number 
of EF packets that are dropped by its token bucket meter. On the other hand, the cor- 
responding nodes in the fixed IP network periodically monitor the average end-to-end 
delays of the real-time flows. 

We have selected a specific type of real-time application that implies burstiness 
and that contains end-to-end delay information: VBR Voice-over-IP (VoIP) [4]. The 
ITU-T recommends in its standard G.l 14 that the end-to-end delay should be kept be- 
low 150 ms to maintain an acceptable conversation quality [5]. Also, for Pulse Code 
Modulation encoding with the G. 711 codec, the packet loss rate should never be lar- 
ger than 5% [6]. 

In DS-SWAN, a destination node sends a QoS_LOST warning message to the in- 
gress edge router when the end-to-end delay of one VoIP flow becomes greater than 
140 ms. 

We have observed from initial simulation runs that the number of dropped VoIP 
packets in the ad hoc network is usually well below 1% when SWAN is used. There- 
fore, we establish that if the number of dropped VoIP packets at the ingress edge 
router is less than 4% and this router has received a QoS_LOST message, then it 
sends the QoS_LOST message to the ad-hoc network to inform that the system is too 
congested to maintain the desired QoS (due to excessive delays at the ad hoc net- 
work). When the number of lost packets at the edge router is higher than 4% the 
packet loss rate will be larger than 5% so that the VoIP quality will be degraded and it 
does not have any sense to send QoS_LOST messages to act over the end-to-end de- 
lays because the packet loss rate will not be diminished. 

When a node in the ad hoc network receives the QoS_LOST message, it will react 
by modifying the parameter values in the AIMD rate control algorithm mentioned 
above. In DS-SWAN, every time that a QoS_LOST message is received, the node de- 
creases the value of c by Ac- bit/s with a certain minimum value. When no 
QoS_LOST message is received during T seconds, the node increases the value of c 
by Ac+ bits/s unless the initial value has been reached. 

When a wireless node receives a QoS_LOST message, it also increases the value of 
r by Ar+ up to a maximum value. When no QoS _LOST message has been received in 
the period T, the value of r is decreased by A r- until the initial value is reached. 

SWAN has a minimum rate m for the best-effort leaky bucket traffic shaper. In DS- 
SWAN nodes are also allowed to reduce m. When a node receives a QoS_LOST mes- 
sage, it reduces the m by Am- bit/s. However, this parameter value is kept above a 
minimum value of m 0 bit/s and is increased A m+ bits/s every T seconds up to the ini- 
tial value when the mobile nodes do not receive a warning message in T seconds. 
Thus, we change the parameter values of SWAN dynamically according to the traffic 
conditions not only in the ad hoc network but also in the fixed network. 
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Fig. 1 . Example network 



We have designed two different versions of DS-SWAN: 

• When all the CBR sources and the intermediate nodes along the routes are ad- 
vised (“DS-SWAN - CBR sources”) so that they throttle their best-effort traf- 
fic. 

• When the edge router sends a QoS_LOST message only to the VoIP sources 
generating flows that have problems to keep their end-to-end delays under 150 
ms and to the intermediate nodes along the routes. Then these nodes forward 
the QoS_LOST message as a broadcast packet to all their neighbours because 
they may be contending with them for medium access. Only when a node re- 
ceives a QoS_LOST message as a broadcast packet, it throttles its best-effort 
traffic. (“DS-SWAN - VoIP sources + neighbours”). 

The different functioning of both versions of DS-SWAN is illustrated in Fig. 1. It 
shows an example of an ad hoc network where one VoIP real-time flow and three 
CBR best-effort flows have been established so that packets are sent toward Internet 
through the gateway. First we apply the version (“DS-SWAN - CBR sources”). If we 
consider that the VoIP flow has problems to keep its end-to-end delay under 150 ms 
QoS_LOST messages will be sent to the CBR sources and the intermediate nodes 
along the routes in the ad hoc network so that nodes A, B, C, D, E, F, G, H, I, J and 
the gateway will throttle their best-effort traffic. 

Although nodes H and I do not compete for medium access with the nodes along 
the route towards the VoIP problematic flow they will slow down their CBR traffic ei- 
ther because these nodes have been warned in this DS-SWAN version. On the other 
hand, if the second version (“DS-SWAN - VoIP sources and neighbours”) is applied, 
then the CBR sources and intermediate nodes that are not neighbours of the VoIP 
problematic source and its intermediate nodes along the route are not warned so that 
in the example nodes H and I will not throttle their best-effort traffic. It is important to 
notice, however, that with this DS-SWAN version some nodes like node E in the ex- 
ample will receive the QoS_LOST broadcast packet more than once because they are 
neighbours from many nodes (node E has nodes B and C as neighbours) so that they 
will act over the Leaky Bucket parameters to rate control best-effort traffic several 
times. 
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4 SD-AODV (Service Differentiation-AODY) 



We have used the Internet draft “Global Connectivity for IPv6 Mobile Ad-Hoc Net- 
works” [7] to provide Internet access to mobile ad-hoc networks modifying the Ad- 
Hoc On-demand Distance Vector (AODV) routing protocol [8]. AODV is a best- 
effort routing protocol that does not provide QoS to real-time traffic. 

AODV uses the shortest number of wireless hops towards a destination as the pri- 
mary metric for selecting a route, with independence of the traffic congestion. How- 
ever, real-world experiments have shown that the shortest hop count metric often 
chooses routes poorly, thus leading to low throughput, unreliable paths. One reason 
for the shortest hop count metric performing below par is that the metric does not ac- 
count for traffic load during the route selection process. 

It is perfectly possible that AODV selects as shortest route between source and des- 
tination a route with congested nodes or with nodes with real-time flows having prob- 
lems to keep their end-to-end delays. Consequently, there is a clear need that this pro- 
tocol is modified so that it would be able to interwork with the QoS interaction model 
to maintain the average end-to-end delays of real-time flows. 

We have considered that the QoS interaction model “DS-SWAN - VoIP sources + 
neighbours” is applied to our system. In this version a node receives a QoS_LOST 
message because it is either a problematic VoIP source or node along the route to- 
wards the source that has problems to keep its end-to-end delay under 150ms or be- 
cause it is a neighbour from these nodes and it is contending with them for medium 
access producing congestion. Under such conditions nodes consume more energy and 
their packet loss is increased together with their end-to-end delays as we have ob- 
served. With the aid of DS-SWAN it is possible to mitigate the bad effects of conges- 
tion in order to maintain the desired quality for VoIP flows delaying the access of 
best-effort traffic to the MAC layer and consequently to the medium. Furthermore, 
new actions should be taken in conjunction with the explained DS-SWAN scheme not 
only to reduce the existing congestion in some problematic regions but also to avoid 
that new traffic load can increase it. 

We present a simple and scalable routing protocol named SD-AODV that takes ad- 
vantage of the co-operation between SWAN and DiffServ to avoid that the degree of 
congestion in the network grows. 

The simple goal of SD-AODV is to redirect new “routes” away from nodes that 
have received a QoS_LOST broadcast message in the case that these nodes have con- 
gestion problems (with exception of the gateways). We consider that a node has con- 
gestion problems if its average MAC delays during the RTS-CTS-DATA-ACK cycle 
exceed a predefined value D MAX . The MAC delay [9] can be estimated by the total de- 
fer time accumulated plus the time to acknowledge the packet if no collision occurs 
For example, if RTS/CTS is enabled: 



d = t 



defer 



t RTS ^ 



CTS 



t packet * ACK ^ SIFS 



+ 3r, 



(1) 



where T estimates the maximum propagation delay. Each node independently moni- 
tors and computes its average MAC delays and can declare itself as a congested node 
if these delays are excessive. SD-AODV acts in a fully distributed manner suppressing 
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new route requests to these congested nodes to ensure that new routed traffic does not 
increase the congestion in the bottlenecks in order to continue maintaining the desired 
QoS parameters for real-time traffic. The naive suppression of route creation may pre- 
vent the use of the only possible path between two hosts but we argue that further one 
can not offer a priori a new VoIP or CBR flow a route towards the destination at the 
expense of on going real-time flows. 

The congested zones are motivated due to the excessive contention of the shared 
medium and are transient in nature because flows are rerouted due to changes in the 
network topology and changes in the traffic load. When there is a link failure (for ex- 
ample due to node mobility) and a route towards a destination is broken, the routing 
protocol has to find a new route towards the destination (or a stored route in a routing 
table can be used). Nevertheless, if some or all nodes along the old route were marked 
as congested nodes now these nodes are unmarked because the traffic and topology 
conditions have changed and it is not possible to know a priori if these nodes will still 
continue experiencing congestion or not. The Differentiated Services architecture will 
continue co-operating with the SWAN model so that DS-SWAN will receive updated 
information dynamically about the QoS parameter values in the ad hoc network and 
the unmarked nodes could be marked as congested nodes again in the future if it is 
necessary. 

With SD-AODV it is possible to reduce the energy consumption from the congested 
nodes and thus these nodes will not exhaust their energy resources prematurely and 
the problem that the network suffers a premature partition can be reduced. 

We observe that the node selection of the congested nodes depends on the interac- 
tion between the Differentiated Services architecture and the SWAN model and a co- 
operation between the DS-SWAN model and SD-AODV is absolute necessary so that 
SD-AODV may be able to avoid that the traffic is concentrated at certain nodes. SD- 
AODV interworks with the existing QoS model to maintain the desired QoS parame- 
ters for real-time flows. SD-AODV is a modification of the AODV routing protocol, 
but it would have been possible to modify any routing protocol working in the ad hoc 
network. SD-AODV can not be considered a QoS routing protocol because it does not 
try to provide QoS support or QoS routes; however it contributes with the aid of DS- 
SWAN to maintain the QoS in the ad hoc network 



5 Simulations 

We have run simulations with the NS -2 tool [10] to investigate the performance of 
DS-SWAN with a relatively realistic physical layer model. 

The system framework is shown in Fig. 2. We consider a single DiffServ domain 
(DS-domain) covering the whole network between the corresponding hosts and the 
two wireless gateways. The chosen scenario consists of 20 mobile nodes, 2 gateways, 
3 fixed routers and 3 corresponding hosts. In this work we consider a hybrid gateway dis- 
covery method [11] to find a gateway. The mobile nodes are uniformly distributed in a 
rectangular region of 700 m by 500 m. Each mobile node selects a random destination 
within the area and moves toward it at a velocity uniformly distributed between 0 and 
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Fig. 2. Simulation framework. 

3 m/s. Upon reaching the destination, the node pauses for 20 s, selects another destina- 
tion and repeats the process. The dynamic routing algorithm is AODV [8] and the 
wireless links are IEEE 802.1 lb. 

Background traffic is generated by 13 of the mobile hosts, while VBR VoIP traffic 
is generated by 15 of the mobile hosts. The destinations of each of the background and 
VoIP Hows are chosen randomly among the three hosts in the wired network. 

For the voice calls, we use the ITU G711 a-Law codec [4], The VoIP traffic is 
modelled as a source with exponentially distributed on and off periods with 1.004 s 
and 1.587 s average each. Packets have a constant and are generated at a constant in- 
ter-arrival time during the on period. The VoIP connections are activated at a starting 
time chosen from a uniform distribution in [10 s, 15 s]. Background traffic is Constant 
Bit Rate (CBR) with a rate of 48 Kbit/s and a packet size of 120 bytes. To avoid syn- 
chronization, the CBR flows have starting times chosen randomly from the interval 
[ 15 s, 20 s] for the first source, [20 s, 25 s] for the second source and so on. 

Shaping of EF and BE traffic at the edge router is done in two different drop tail 
queues. The EF and BE aggregates are policed with a token bucket meter with CBS = 
1000 bytes and CIR = 200 Kbit/s. 

We have ran 40 simulations to assess the end-to-end delay and packet loss of VoIP 
traffic and the throughput of CBR traffic. We have evaluated and compared the per- 
formance of SWAN (Case 1) with the two different implementations of DS-SWAN 
discussed in Section 3: “DS-SWAN - CBR sources” (Case 2) and “DS SWAN - VoIP 
sources + neighbours” (Case 3). 

Fig. 3 shows the average end-to-end delay for VoIP traffic in the three cases. Using 
SWAN the end-to-end delays increase progressively because the system is congested 
with VoIP flows and background traffic. From the second 115 until the end of the 
simulation the end-to-end delays are too high for an acceptable conversation quality 
[5]. In DS-SWAN the end-to-end delays of the VoIP flows are reduced because some 
nodes in the ad-hoc network are advised and react throttling their best-effort traffic. 
For this reason it is possible that the VoIP flows are able to maintain their QoS pa- 
rameters achieving an acceptable conversation quality. In Case 3 the average end-to- 
end delays are lower than in Case 2 because some nodes receive more than once the 
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Fig. 3. Average end-to-end delay for VoIP traffic: SWAN (Case 1) vs. DS-SWAN (Cases 2 
and 3). 




Fig. 4. Average throughput for background traffic: SWAN (Case 1) vs. DS-SWAN (Cases 2 
and 3). 

same QoS_LOST broadcast message. On the other hand, in Case 2 some nodes with 
CBR packets slow down this kind of traffic unnecessarily because they may be not 
contending with VoIP flows having problems to keep their delays although they re- 
ceive the QoS_LOST message. Case 3 shows the best results because this DS-SWAN 
version acts only when and where it is needed. 

Fig. 4 shows the average throughput for background traffic. In DS-SWAN, the av- 
erage throughput for this kind of traffic is lower than in SWAN because some nodes in 
the ad hoc network react by decreasing the rate of the best-effort traffic shaper when 
they receive a warning. 

In Case 2, the average throughput is sometimes smaller than in Case 3 because all 
nodes with best-effort traffic rate control their flows. However, in other time intervals 
the average throughput is higher in Case 2 in comparison with Case 3 because some 
nodes receive the same QoS_LOST message as broadcast packet more than once. 
Therefore the reduction of the average throughput for CBR traffic of the DS-SWAN 
model in comparison with the SWAN model will depend on the number of CBR 
sources in Case 2 and on the number of nodes that receive a QoS_LOST broadcast 
message more than once and on how many times they receive it. In any case, there is 
not starvation of background traffic. 

The packet loss rate for VoIP was well below the required 5% in all simulations. 

Now we have run 40 simulations to assess the end-to-end delay of VoIP traffic and 
the throughput of CBR traffic. We have evaluated and compared the performance of a 
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Fig. 5. Average end-to-end delay for VoIP traffic: DS-SWAN (VoIP sources + neighbours) and 
AODV (Case 1) vs. DS-SWAN (VoIP sources + neighbours) and SD-AODV (Case 2). 




Fig. 6. Average throughput for background traffic: DS-SWAN (VoIP sources + neighbours) 
and AODV (Case 1) vs. DS-SWAN (VoIP sources + neighbours) and SD-AODV (Case 2). 

system using (“DS-SWAN - VoIP sources + neighbours”) as QoS scheme and 
AODV as routing protocol (Casel) with a system using the same QoS scheme but SD- 
AODV as routing protocol (Case 2). The parameter value D MAX has a great impact on 
the number of congested nodes that is detected and in our simulations it is set to 20 
ms. However, each node calculates the average MAC delays during the simulation. 

Fig. 5 represents the average end-to-end delays for VoIP. We can appreciate that 
there is a significant improvement in the average end-to-end delays for VoIP traffic 
when SD-AODV is used as routing protocol (Case 2) in comparison with Case 1 . The 
reason for this behaviour is that the VoIP traffic sources having problems to keep their 
end-to-end delays and nodes along the routes as well as their neighbours that compete 
for medium access are not overloaded with more CBR or VoIP traffic flows once it is 
checked that these nodes are congested, and the new route requests towards these 
nodes are suppressed. Besides, the probability that new VoIP flows experience more 
congestion conditions is reduced because the new routes for these flows avoid select- 
ing previously declared congested nodes. 

Fig. 6 represents the average throughput for CBR best-effort traffic. We can appre- 
ciate that in both cases there is not starvation of best-effort traffic. However, using 
SD-AODV as routing protocol the new CBR traffic sources select routes avoiding 
congested nodes and as a result the average throughput is increased with respect to 
Case 1 because it is less probable that these sources would have to throttle their flows. 
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6 Conclusions 

The combination of DS-SWAN and SD-AODV reduces the average end-to-end delays 
of VoIP flows and improves the average throughput for best-effort traffic. For this 
reason, it is recommended to combine a QoS interaction model just as DS-SWAN 
with such a routing scheme in order to improve the network performance. As future 
work we want to show how the DS-SWAN QoS model and the SD-AODV routing 
protocol perform in a range of configurations under a variety of representative traffic 
loads and in the case of a change of traffic mix. 
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Abstract. Medium Access Control (MAC) policies in which the 
scheduling time slots are allocated irrespectively of the underline topol- 
ogy are suitable for ad-hoc networks, where nodes can enter, leave or 
move inside the network at any time. Topology-unaware MAC policies, 
that allocate slots deterministically or probabilistically have been pro- 
posed in the past and evaluated under heavy traffic assumptions. In this 
paper, the heavy traffic assumption is relaxed and the system throughput 
achieved by these policies is derived as a function of the traffic load. The 
presented analysis establishes the conditions and determines the values of 
the access probability for which the system throughput under the proba- 
bilistic policy is not only higher than that under the deterministic policy 
but it is also close to the maximum achievable, provided that the traf- 
fic load and the topology density of the network are known, sub-optimal 
solutions are also provided. Simulation results for a variety of topologies 
with different characteristics support the claims and the expectations of 
the analysis and show the comparative advantage of the Probabilistic 
Policy over the Deterministic Policy. 



1 Introduction 

The design of an efficient Medium Access Control (MAC) protocol in ad-hoc 
networks is challenging. The idiosyncrasies of these networks, where no infras- 
tructure is present and nodes are free to enter, leave or move inside the network 
without prior configuration, allow for certain choices on the MAC design de- 
pending on the particular environment and therefore, it is not surprising that 
several MAC protocols have been proposed so far. Various MAC protocols are 
widely employed in ad-hoc networks, such as [1], [2], [3], [4], [5], [7]. In general, 
optimal solutions to the problem of time slot assignment often result in NP-hard 
problems, [6], which are similar to the n-coloring problem in graph theory. 

Topology-unaware: TDMA scheduling schemes determine the scheduling time 
slots irrespectively of the underlying topology and in particular, irrespectively 
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of the scheduling time slots assigned to neighbor nodes. The topology-unaware 
scheme presented in [8], exploits the mathematical properties of polynomials 
with coefficients from finite Galois fields to randomly assign scheduling time slot 
sets to each node of the network. For each node it is guaranteed that at least 
one time slot in a frame would be collision-free, [8]. Another scheme, proposed 
in [9], maximizes the minimum guaranteed throughput. However, both schemes 
employ a deterministic policy (to be referred to as the Deterministic Policy) for 
the utilization of the assigned time slots that fails to utilize non-assigned time 
slots that could result in successful transmissions, [10]. Therefore, a new policy 
was proposed that probabilistically utilizes the non-assigned time slots according 
to a common access probability p (to be referred to as the Probabilistic Policy ). 

All the aforementioned works ([8], [9], [10]) have focused on heavy traffic 
conditions and no work has been conducted for the non-heavy traffic case. In 
this paper, the probability A, 0 < A < 1, that a node has data for transmission 
during one time slot (probability A is also referred to as the traffic load) is 
considered. 

In Section 2 both transmission policies are described. In Section 3, analyt- 
ical expressions for the system throughput as a function of the traffic load are 
derived for both policies. In Section 4, the conditions for the existence of an effi- 
cient range of values for the access probability p (values of the access probability 
p under which the Probabilistic Policy outperforms the Deterministic Policy) are 
established through an approximate analysis. Furthermore, this analysis deter- 
mines the value of the access probability that maximizes the system throughput 
provided that the traffic load and the topology density are known. In Section 5 
the case when the traffic load and/or the topology density are not known is con- 
sidered and expressions are derived such that the system throughput under the 
Probabilistic Policy, even though not maximized, is higher than that under the 
Deterministic Policy. Simulation results, presented in Section 6 for a variety of 
topologies with different characteristics, support the claims and expectations ad- 
dressed by the results of the analysis in the previous sections. Section 7 presents 
the conclusions. 



2 Transmission Policies 



An ad-hoc network may be viewed as a time varying multihop network and may 
be described in terms of a graph G(V, E), where V denotes the set of nodes and 
E the set of links between the nodes at a given time instance. Let |A| denote the 
number of elements in set X and let N = \V\ denote the number of nodes in the 
network. Let S u denote the set of neighbors of node u, u € V. These are the nodes 
v to which a direct transmission from node u ( transmission u — > v) is possible. 
Let D denote the maximum number of neighbors for a node; clearly |S' U | < D, 
Vu £ V. Time is divided into time slots with fixed duration and collisions with 
other transmissions is considered to be the only reason for a transmission not 
to be successful ( corrupted ). It has been shown, [10], that transmission u — > v is 
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corrupted in time slot i if at least one transmission x ~ t X f S v U {f} — {«.} 

and ip £ S x , takes place in time slot i. 

According to the transmission policy proposed in [8] and [9], each node u £ 
V is randomly assigned a unique polynomial f u of degree k with coefficients 
from a finite Galois field of order q ( GF(q )). Polynomial /,, is represented as 



k 

E 

i = 0 



fu(x) = V ai^^mod q ), [9], where a,; £ {0,1,2 1}; parameters q and k 



are calculated based on N and D, according to the algorithm presented either 
in [8] or [9]. 

The access scheme considered is a TDMA scheme with a frame consisted of 
q 2 time slots. If the frame is divided into q subframes s of size q, then the time 
slot assigned to node u in subframe s, (s = 0, 1, ..., q— 1) is given by f u (s) mod q , 
[9]. Consequently, one time slot is assigned for each node in each subframe. Let 
fl u be the the set of time slots assigned to node u. Given that the number of 
subframes is q , \fl u \ = q. 

The deterministic transmission policy, proposed in [8] and [9], is the following. 

The Deterministic Policy: Each node u transmits in a slot i only if * £ fl u , 
provided that it has data to transmit. 

Let Oi^u-^y be that set of nodes \ whose transmissions corrupt a particular 
transmission u —> v (i.e. x € U {t>} — {it}) and which are also allowed to 
transmit in time slots i (i.e., they are assigned slot i or i £ fl x ). Let O, be 
the complementary set of nodes in S v U {v} — {«} for which i ^ i? x . 



Oi, u - >v = jx = X S S v U {v} - {u},i £ S? x j, 
Oi,u-> v c = (x : X e S v U {v} - {u},i 17 x l. 



(1) 

(2) 



Obviously, |O i)U ^| + \Oi^ v c \ = |5„|. 

Depending on the particular random assignment of the polynomials, it is 
possible that two nodes be assigned overlapping time slots (i.e., fl u fl fl v ^ 0). 
Let C u -> v be the set of overlapping time slots between those assigned to node u 
and those assigned to any node x G S v U {u} — {m}. 



C u 



= fi u n 



u 



fl' 



(3) 



aes„u{i>}-{u} 



Obviously, if i £ C u ^. v (|0,:, U ->. J ,| > 0 when i £ C u ^ v ), it is possible that 
another transmission will corrupt transmission u — > v in time slot i, provided 
that there are data for transmission. If i £ fl u — (|Oj )U ^„| = 0), no 

transmission corrupts transmission u — > v in time slot i. 

Let R u ^v denote the set of time slots i, i fl u , over which transmission 
u — > v would be successful even for heavy traffic conditions (A = 1). Equivalently, 
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R u ^,v contains those slots not included in set UxeS„u{u} ^x- Consequently, 



l-Ru 



= 



U ^ x 

xes„uM 



(4) 



In order to use all non-assigned time slots, R u ^ v , without the need for further 
coordination among the nodes, the following probabilistic transmission policy 
was introduced in [10]. 

The Probabilistic Policy: Each node u always transmits in slot i if * £ fi u and 
transmits with probability p in slot i if i (f f2 u , provided it has data to transmit. 



3 System Throughput 



In order to derive the expressions for the system throughput under both the 
Deterministic Policy and the Probabilistic Policy, it is necessary to derive the 
expressions for the probability of success of a specific transmission (throughput) 
under both policies. Let Pi t D,u^v (Pi,p,u^v) denote the probability that trans- 
mission u — > v in slot i is successful, under the Deterministic (Probabilistic) Pol- 
icy. Let Pd, u ^> v (Pp,u-mi) be the average probability over a frame for transmission 
u — > v to be successful during a time slot, under the Deterministic (Probabil is- 

tic) Policy. That is, Pd, u —*v = ^2 1 Pi,D,u—*v (Pp,u—>v ~ ^ 2 E),— i P? ) , 

where q 2 is the frame size, in time slots. 

Under the Deterministic Policy each node u transmits over a time slot i with 
probability A, if * £ Q u . Any other node y, for which i £ L2 X , also transmits 
with probability A. Consequently, Pi,D,u^,v — 0, for i ^ f2 u , and Pi,D,u^v = 
A(l — A)l 0i '“- > ”l, for i £ f2 u . Given that |Oj )U _>„| = 0 for those slots i £ f2 u —C u -+ v 
(|Oj >u _^| > 0 when i £ C u ^ v ), it is concluded that Pi t D,u^v = A, for i £ 
f2 u — C u ^y v . As a result, 



Pi 



q~\C u 



E 



D.u—^v — 



iec u 



,(1-A)I°‘- 



A 



-A, 



(5) 



where |L?„| = q. 

Under the Probabilistic Policy, if * £ L2 U , each node u transmits over a time 
slot i with probability A, while if i (f (2 U each node u transmits with probability 
pX. Any other node y, for which i £ 17 X , transmits in the same time slot i with 
probability A, whereas if i fi x it transmits with probability pX. Consequently, 
for i £ Q u , P it p, u ^ v = A(1 — A)l° i ’ u - > ”l(l — pA)l 0i ”*- > d, while for i ^ Q u , 
Pi,p,u^v =pA(l-A)l 0i '“^"l(l-pA) |0i ’“^ , ' C| . Given that + \O itU ^ v c \ = 

\S V \, for i £ Q u , P it p tU ^ v = A(^)' 1 (1 -pA) 15 ” 1 , while for i <£ f2 u , 

Pi,p,u^v = (i — pX)\ Sv \. By definition, \O itU ^ v \ = 0, if i £ 
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(f2 u — C u ->. v )URu->-vi and \Oi' U ^rv\ > 0 if i € C u ^ v or i ^ f2 u U R u ^v ■ As a result, 



Pp,i 



-A(l — p\)\ Sv \ 



y' ( 1-A ^ 

Z^iec u _,„ i-pA J 

q 2 

| q- \C U ^ V \ + p\Ru^v\ ^ _ p ^d5„i 
q 2 

,|Oi,u-4 



p E 



i£R u -> v UO t 



m 



-A(l-pA) 



is. I 



( 6 ) 



The destination node i; of a particular transmission u — » v, depends on the 
destination of the data and consequently, on the network application as well as 
on the routing protocol. For the rest of this work it will be assumed that a node 
u transmits to only one v, v £ S u . For any of the numerical or simulation results 
presented in the sequel, node v will be a node randomly selected out of S u . 

The probability of success of a transmission (averaged over all nodes in the 
network) under each corresponding policy, is referred to as the system through- 
put and is denoted by Pd (Pp) under the Deterministic (Probabilistic) Policy. 
According to equations (5) and (6), the system throughput Pd and Pp is derived 
by Pd = Evuev P d,u^v and P P = A J2\/uev p p,u^v, respectively. 



4 System Throughput Maximization 

As it is difficult to derive analytical expressions and establish the conditions for 
the system throughput maximization, a more tractable form of Pp is considered. 
Let Pp denote that value of Pp when jEy\ is replaced by 1. Then, 

1+iK<, ~ 1) A(l- P »' s -'. (7) 

v \/uev q 

Since jEy\ < 1, it is clear that Pp > Pp. 

Even though Pp has a more tractable form than that of Pp, it still cannot be 
easily analyzed further. It can be seen from Equation (7) that Pp corresponds to 
a polynomial of D + 1 degree with respect to p, which is difficult or impossible 
to be solved in the general case ( D > 1). A more tractable form of Pp, Pp, 
is analyzed instead. Pp is equal to ^ Evuev A(1 — pA)l s l, where |S| = 

jj Evwe v I E | ■ Finally, Pp is given by the following equation. 

Pp = 1+P ^~ 1 \ (l - pXf\. (8) 

IS) is the average number of neighbor nodes of each node in the network. Let 
|S|/P be referred to as the topology density for a particular network. Given 
that D is known (and constant), knowledge of |S| is enough to determine the 
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topology density \S\/D and vice versa. It is clear that |S| (or the topology density 
\S\/D) influences exponentially the system throughput as it can be concluded 
from Equation (8). 

The following two theorems show that there exists a range of values of p of 
the form [0,p max ] (where p ma x , 0 < Pmax < 1, corresponds to that value of p for 
which Pp = Pp>) such that Pp > Pd, irrespectively of the value of A. 

Theorem 1. There exists an efficient range of values for p such that Pp > Pp, 
irrespectively of the value of X. 

Proof. Under the Deterministic Policy, q is the maximum number of time slots 
during which a transmission is successful and therefore, qX is the maximum 
(average) number of successful transmissions in one frame. For any transmission 
u v, according to Equation (5), q — \C U ^. V | + Y^ieC (1 — A < qX. 

Consequently, Pp < |. It thus suffices to identify the range of values of p for 
which Pp > A. 

It can be shown that = P A (l s l+i) fl -g)+ 9 - 1 -A|s| — pA)l s l _1 and there- 
fore, = 0 for p = (= p 0 ). Given that P P = Pp for p = 0, 

the required range of values for p does not exist if ^gP < 0 for all p £ (0, 1]. 
< 0 is satisfied if p 0 < 0 and as it may also be shown, p 0 < 0, if A > =p 

Given that 151 < D (equality holds if all nodes have D neighbor nodes) and 
q > kD + 1 > D + 1 (equality holds for k = 1), it is concluded that |S| < q — 1 
or > 1. Given that A < 1, A > 2=1 is not satisfied and consequently, it is 

not possible that < 0 for all p £ (0, 1]. Consequently, there exists a range of 
values for p such that Pp > Pp, irrespectively of the value of A. □ 

Theorem 2. The efficient range of values for p of Theorem 1 is of the from 
[0,Pmax], for some 0 < p ma x < 1- 

Proof. If A > ^ is satisfied then, it may be shown that po < 1 is satisfied, 

where p Q = — ■ It can be shown that for p close to 0, ^4^ > 0. Given 

that = 0 for p = p 0 , ^ > 0 for p < p 0 , and ^ < 0 for 1 > p > p 0 
(2Afo = 0 for only one value of p £ (0,1). As a result, when A > MU 1 — , 
P p = A for p = 0, and as p increases, Pp increases, until p = p 0 . As p increases 
andp > po, Pp decreases until p = 1, where Pp = A(l — A)I S L If A(1 — A) I* 5 ! > A 5 
then Pmax = 1. IfA(l — A)I' s I<A ) there exists a value of p = p max < 1, such 
that Pp = A. 

If A < H ien > 0, fo r any value of p. Consequently, Pp con- 

stantly increases and therefore, p max = 1 and the maximum is assumed for 

p = 1. □ 
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Theorem 3. If A < — % 1 — , Pp is maximized for p = 1. If A > 'f 1 — , Pp 
1 q|S|+g-l/_ ^ IP]- q \ S \ +q -l’ 

is maximized for p = f— 1 A ^ ^L— . 

J F A(|S|+l)(g-l) 

Proof. For A < Sr 1 — , as it was shown in the proof of Theorem 2, APe > 0. 

g|S|+g— 1 1 ’ d P 

Consequently, the maximum value for Pp is assumed for p = 1. 

For A > -AC 1 — , as it was shown in the proof of Theorem 2, ^4^ > 0 for 

- <r|S|+g-l’ 1 ’ dp - 

p € [0,po] and < 0 for p £ [pen !]• Consequently, the maximum value for Pp 



□ 



dp 

is assumed for p = p 0 = ■ 

Based on the results of Theorem 3, the value of the access probability p that 
maximizes Pp, denoted by p x j^y, is given by the following equation. 



1, if A < 



q - 1 



P\,\s\ 



g|S|+g-l ’ 
q-l-A|S| -r \ . 

A(|5|+l)(q-l)’ - g|S|+g-l‘ 



(9) 



For p = p x S | , Pp is maximized but it is also expected that Pp will have 
a better performance than that achieved by a fixed value of p when A is not 
constant. 



5 Unawareness of A and l^l 



Knowledge of both A and l^l is required in order to utilize the results of the 
analysis presented in Section 4 and in particular, by Equation (9). If the topology 
density \S\/D is known, 1 5*1 can be calculated ( D is known). In the general case, 
it is possible that A and/or IS) are not known and therefore, it would be useful 
to derive analytical expressions for those values of p that the system throughput 
under the Probabilistic Policy even though it is not maximized, at least is higher 
than that under the Deterministic Policy. For the case for which A is known but 
|S| is not, the maximum topology density value (corresponding to |S| = D) 
is considered and the corresponding value for p, denoted by p\, is given by 
Equation (10). For the case for which |S| is known but A is not, a heavy traffic 
scenario (corresponding to A = 1) is considered and the corresponding value for 
p, denoted by //gy, is given by Equation (11). Finally, for the case for which both 

A and 151 are not known, the corresponding value for p (A = 1 and |5| = D), 
denoted by p, is given by Equation (12). 



P A 



1, if A < 

A(D+l)(g— 1)’ 11 A - 



q - 1 . 

qD+q-l ’ 

q - 1 

qD+q-l' 



g_- 

|s| (| 5 | + 1 )(«- 1 )‘ 

q — l — D 

p - (D + l)(q-iy 



(10) 

( 11 ) 



(12) 
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6 Simulation Results 

In this section, networks of 100 nodes are considered for various values of D 
and the topology density \S\/D. The aim is to demonstrate the applicability of 
the analytical results for a variety of topologies with different characteristics. 
In particular, four different topology categories are considered. The number of 
nodes in each topology category is set to N = 100, while D is set to 5, 10, 15 and 
20. These four topology categories are denoted as D5N100, D10N100, D15N100 
and D20N100, respectively, and correspond to a certain value of topology density 
\S\/D. 1 depicts simulation results for the system throughput Pp, for different 
topology density values close to 0.6. For all cases the number of neighbor nodes 
for each node is not the same; this leads to nonzero values for the topology 
density variation Uar{|5|}. The algorithm presented in [9] is used to derive the 
sets of scheduling slots and the system throughput is calculated averaging the 
simulation results over 100 frames. Time slot sets fi x are assigned randomly to 
each node %, for each particular topology. The particular assignment is kept the 
same for each topology category throughout the simulations. 

Figure 1 depicts simulation results for the system throughput Pp as a func- 
tion of A, for different values of p (p = 0, p = p x jgj, P — Px, P = Pfs\, P — P 
and p = 1.0). For p = 0 the system throughput under the Probabilistic Policy is 
identical to that under the Deterministic Policy ( Pp=Pd )• This case is depicted 
throughout the simulation results for comparison reasons. For p = 1.0, it can be 
seen that as A increases, Pp increases rather fast until a certain maximum value 
and then decreases until A = 1.0, where Pp = 0. For p = p x r^j it is evident that 
the system throughput under the Probabilistic Policy is not only higher than 
that under the Deterministic Policy but it is also close to the maximum for all 
different topologies. 

For p = p\ and for small values of A, as it can be seen in Figure 1, the 
system throughput is identical to that obtained for P = P x js\ anc ^ P = 1-0- This 

can be concluded from Equation (9) and Equation (10) where for A < q ^^_ 1 , 
P\js~\ = Px = 1-0. For A > the system throughput, even though not 

close to that obtained for p = p x t^t, is higher than that obtained under the 
Probabilistic Policy. For p = and for small values of A, the system throughput 
is not close to that obtained for P = P x \s\ but is higher than that obtained 
under the Deterministic Policy, irrespectively of the topology density value. As 
A increases the system throughput increases and for large values of A it is close 
to the maximum. For p = p, the system throughput curve is similar to that 
for p = pj^j-, except that the obtained system throughput is smaller. As the 
topology density increases, the system throughput for both cases (p = p and 
p = pjgj-) converges. Again, the obtained system throughout is higher than that 
obtained under the Deterministic Policy. This is an important observation since 
p is calculated without knowledge of either A or 151. 
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Fig. 1 . System throughput simulation results as a function of A for different values of 
p (p = 0, p = p x jgy, p = p\, p = p = p and p — 1.0) and medium topology density 

values \S\/D. 



7 Conclusions 

The system throughput under both the Deterministic Policy and the Probabilis- 
tic Policy has been investigated previously for heavy traffic conditions, [8], [9], 
[10]. In this work, expressions have been derived and analyzed for the general 
non-lreavy traffic case. The results of this analysis established conditions and 
provided expressions in order for the system throughput under the Probabilistic 
Policy to be not only higher than that under the Deterministic Policy, but also 
be maximized, for various traffic loads. 

Expressions for the system throughput were derived for both policies as a 
function of the traffic load. These expressions determine the values of the access 
probability p for which the system throughput under the Probabilistic Policy 
is higher than that under the Deterministic Policy. For the particular case for 
which both the traffic load (A) and the topology density (|S , |/D) are known, 
the system throughput is maximized for the derived value of access probability, 
p = p x jgj . For the cases for which A and/or \S\/D are not known, analytical 
expressions for the appropriate values of p were also derived leading to a system 
throughput which, even though not maximized, is higher than that achieved 
under the Deterministic Policy (p = p\, p = and p = p). Simulations have 
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been conducted for a variety of different topologies with different characteristics 
and different traffic loads that support the claims and the expectations of the 
analysis. 

In conclusion, a simple and easily implemented transmission policy like the 
Probabilistic Policy, allows for nodes to transmit without any need for coor- 
dination among them and without considering the topological changes; this is 
essential for ad-hoc networks. The results of the analysis provided in this paper 
can be used to specify an access probability that achieves a system throughput 
close to the maximum for different traffic loads. 
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Abstract. To achieve Quality of Service (QoS) in Next Generation Networks 
(NGNs), the Differentiated Services architecture implements appropriate Per Hop 
Behavior (PHB ) for service differentiation. Common recommendations to enforce 
appropriate PHB include Weighted Round Robin (WRR), Deficit Round-Robin 
(DRR) and similar algorithms. They assign a fixed bandwidth share to Transport 
Service Classes (TSCs) of different priority. This is a viable approach if the ratio 
of high priority traffic TSChigh over low priority traffic TSCi ow is known in 
advance. If TSChigh holds more and TSCi ow less users than expected, the QoS 
for TSChigh can be worse than for TSCi ow . As shown in preceding work, the 
Modified Earliest Deadline First (MEDF) algorithm heals this problem on the 
packet level. Therefore, we investigate its impact in congested TCP/IP networks 
by simulations and show its attractiveness as a powerful service differentiation 
mechanism. 



1 Introduction 

Current research for multi-service Next Generation Networks (NGNs) focuses amongst 
others on the provision of Quality of Service (QoS) for different service classes. The 
Differentiated Services architecture [1], [2] achieves QoS by implementing appropri- 
ate Per Hop Behavior (PHB) for different Transport Service Classes (TSCs). Flows 
of different TSCs compete for the resources buffer space and forwarding speed in the 
routers. Mechanisms that assign those resources divide buffer space among different 
TSCs (buffer management) and control the order in which packets are dequeued and 
forwarded (scheduling). Therefore, those mechanisms can be characterized along two 
dimensions: space and time. 

Common examples and recommendations [3] [4] to enforce appropriate PHB are 
algorithms like Weighted Round Robin (WRR), Class Based Queueing (CBQ) [5], and 
Deficit Round-Robin (DRR) [ 6 ]. The common goal is to assign a fair share of network 
resources to different TSCs. The share is set in advance and fixed independently of the 
actual traffic mix. This behavior is desirable in many situations. In a network where a 
ratio q of high priority TSC ( TSChigh ) traffic over low priority TSC ( TSCi ow ) traffic is 
expected, e.g. due to network admission control, the algorithms can be used to assign this 
share. The low priority TSC ( T SCi ow ) uses the remaining bandwidth where a fraction of 
1 — (7 is guaranteed. However, if resources are scarce and buffers always contain packets 
of both classes, these algorithms enforce the share q regardless of the current traffic mix. 
Particularly, if the TSChigh traffic exceeds the limit set by the control parameters, it 
suffers from QoS degradation. 



J. Sole-Pareta et al. (Eds.): QoflS 2004, LNCS 3266, pp. 94-103, 2004. 
(c) Springer- Verlag Berlin Heidelberg 2004 
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The authors of [7] introduced the priority algorithm Modified Earliest Deadline First 
(MEDF). They showed that MEDF prefers TSChigh over TSCi ow on the packet level 
equally regardless of the traffic mix ratios. This is a clear advantage of MEDF compared 
to the previously mentioned algorithms that assign a fixed share for the whole TSChigh 
aggregate. 

In this paper we focus on the impact of MEDF in TCP/IP networks. For saturated 
TCP sources conventional algorithms are problematic because of the fixed share of 
bandwidth assigned to each traffic class regardless of the current number of flows. We 
bring the MEDF algorithm into play to achieve a relative traffic-mix-independent per- 
flow-prioritization among TSCs. But still, this behavior is easily configurable by per-class 
relative delay factors. 

The algorithms that work on the IP packet level impact the performance of adap- 
tive TCP flows by packet loss and delay (round trip time). Packet loss is influenced by 
space priority mechanisms, delay by time priority mechanisms. In this work we combine 
MEDF with space priority mechanisms like Full Buffer Sharing (FBS) and Buffer Shar- 
ing with Space Priority (BSSP) and contrast it to time priority mechanisms like First In 
First Out (FIFO) and Static Priority (SP). 

This work is structured as follows. In Section 2 we present the algorithms under 
study in detail. Section 3 discusses the simulation environment, the respective parameters 
used for our performance evaluation study, and presents the results obtained from our 
simulations. Sections 4 and 5 finally conclude this work with a short summary and 
outlook on future research. 



2 Space and Time Priority Mechanisms 

Network congestion arises where different flows compete for resources at routers in 
the network. To avoid this problem at least for a certain subset of high priority flows, 
flows of higher priority should receive preferential service as opposed to low priority 
flows. Basically, if packet arrivals exceed the router forwarding speed temporarily or 
permanently, congestion arises and buffers fill up. This leads to longer network delays 
and high packet loss rates, to degraded Quality of Service. Buffer sizes and forwarding 
speed are fixed parameters for given networks. To assign these scarce resources, we 
can limit the space available to the respective flows (buffer management) or we can 
dequeue the packets depending on their priority (scheduling). Thus, mechanisms to 
achieve service differentiation can be divided along two dimensions: space and time. 
Combinations of both are also possible. 



2.1 Space Priority Mechanisms 

We use two kinds of space priority mechanisms for our performance evaluation: Full 
Buffer Sharing and Shared Buffers with Space Priority. In [10] we compare a third 
space priority mechanism Random Early Detection gateways [11], RED was originally 
designed to detect incipient congestion by measuring the average queue length. Several 
improvements have been suggested for instance in [12] and [13] to achieve fairness in 
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Fig. 1. Buffer Sharing with Space Priority for i — 3 TSCs 

the presence of non-adaptive connections and to introduce TSC priorities. We omit this 
section for lack of space here and refer to our technical report [10]. 

In the following sections, we denote the router buffer by B and packets by P. The 
function S(B) refers to the maximum buffer size and F(B) to the current fill level of 
the buffer. The function enqueueTail(P 1 B) enqueues the packet P into the buffer B. 
The function drop(P) drops the packet P if the algorithms cannot accept the packet. 

Full Buffer Sharing (FBS). The FBS strategy allows all flows to share the same buffer 
irrespective of their priority. If not mentioned differently, we use this mechanism as 
default in our simulations. 

Buffer Sharing with Space Priority (BSSP). The BSSP queueing strategy (cf. Alg. 1) 
is threshold based and allows packets to occupy buffer space available for their TSC 
and for all TSCs of lower priority. Let TSCi, i £ {0, ... , n - 1} be TSCs of different 
priority, 0 being the highest priority. TSCi can at most demand space BS " lax in the 
buffer, where BS™ ax > BS™ 1 and BS™ ax is set to the actual buffer size. The concept 
is illustrated in Fig. 1 for three TSCs and fully described in Alg. 1 with F(B , TSCi) 
denoting the space in the buffer B that is currently filled by TSCi. There is a guaranteed 
amount of buffer space for the highest priority class only, lower priority classes possibly 
find their share taken by classes of higher priority. This concept resembles the Russian 
dolls bandwidth constraints model (RDM) suggested by the IETF traffic engineering 
working group (TEWG) in [14]. 



Require: Packet P, Buffer B, max TSC Buffer Size for i = 0 . . . n — 1 

{ max Buffer Size S(B) = BSo lax } 
i = TSC(P) 

if F(P, TSCj) < BS™ ax then 

enqueTail(P, B) 

else {space limit exceeded for TSC i} 
drop(P) 

end if 

Algorithm 1: Buffer Sharing with Space Priority ENQUEUE 



2.2 Time Priority Mechanisms 

Once packets arrive at the queue and the space priority mechanism assigns available 
buffer space, i.e., it decides whether the packet is accepted or dropped, the time priority 
mechanism decides which packet to dequeue next. This decision on the packet level 
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influences the delay and therefore the TCP sending rate via RTT. We contrast two time 
priority mechanisms to Modified Earliest Deadline First (MEDF). 



First in First Out (FIFO). FIFO leaves the prioritization to the enqueueing option and 
is used as the performance baseline to compare with. Packets proceed in the order they 
arrive and are accepted by the space priority mechanism. 



Static Priority (SP). The Static Priority concept chooses TSChigh packets in FIFO 
order as long as packets of that class are in the buffer. T SCi ow packets wait in the router 
queue until low priority packets only are available. Then they are also dequeued in a 
FIFO manner until new T SC high packets arrive. 



Modified Earliest Deadline First ( MEDF). In the context of the UMTS Terrestrial Radio 
Access Network, the authors of [7] introduced a modified version of the Earliest Deadline 
First (EDF) algorithm called Modified Earliest Deadline First (MEDF). It supports n 
only different TSCs, but in contrast to EDF it is easier to implement. Packets are stored 
in n TSC specific queues in FIFO manner. They are stamped with a modified deadline 
that is their arrival time plus an offset M,;, 0 < i < n, which is characteristic for each 
TSC. The MEDF scheduler selects the packet for transmission that has the earliest due 
date among the packets in the front positions of all queues. For only two TSCs, this is the 
choice between two packets and sorting according to ascending deadlines is not required. 
The difference | M t — Mj \ between two TSCs i and j is a relative delay advantage that 
influences the behavior of the scheduler. We are interested in the performance of this 
scheduling algorithm in the presence of adaptive traffic, here TCP. 

For our simulations we use two TSCs whose queues are implemented as shared 
buffers such that the space priority mechanisms are applicable. With two TSCs we set the 
MEDF parameters to Mhi g h = 0 and Mi ow = x, x € {Os, 0.1s, 0.5s, 1.0s, 1.5s}. Thus, 
TSChigh obtains no additional delay. The deadline for TSCi ow packets is increased by 
the Mi ow parameter. 



3 MEDF Performance Evaluation 



In this section we describe the general goals and approach of our performance evaluation 
study and present the results. We used the network simulator (NS) version 2 [15] to run 
the experiments deploying the RENO TCP implementation [16]. Standard simulation 
methods as replicate-delete were applied to obtain statistically reliable results of the 
non-ergodic random processes. In the following sections we only give average values as 
the simulated time was chosen to yield very narrow confidence intervals. Our goal is the 
measurement of the prioritization of TSChigh traffic. For that purpose, we define the 
normalized bandwidth ratio. Let rihigh be the number of TSChigh flows, ni ow the number 
of TSCi ow flows. The functions B(TSChi g h) and B(TSCi ow ) denote the bandwidth 
used by all TSChigh and TSCi ow flows, respectively. The normalized bandwidth ratio 
B ra tio(T SChighi TSCiow) is the amount of bandwidth used by TSChigh per flow 
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divided by the amount of bandwidth used by TSCi ow per flow: 

B(T SC h ig h ) 

(TQri TCr 1 t n high 

- D ratio\- L ^^highi low ) — B{TSCi ) 

niow 

A mechanism with traffic-mix-independent per-flow-prioritization among TSCs ex- 
hibits the same normalized bandwidth ratio regardless of the traffic mix. The number 
of saturated TCP sources is the same for both TSCs in the following if not mentioned 
otherwise. 

3.1 MEDF Characteristics 

To isolate the general behavior more easily and to eliminate unpredictable side effects, 
we start with single link simulations and extend it to multiple links. 

MEDF Single Link Scenario 

Simulation environment. We use the classical dumbell topology for our single link 
simulation environment. A number of T SC high. TCP traffic sources and a number of 
TSCio W TCP traffic sources connect to Router A. Router A uses a space and a time 
priority mechanism described above and sends the packets over a single link to router 
B. Router B has sufficient capacity to serve the link and its single task is to distribute 
the arriving packets to the corresponding destinations. 

We choose the number of simultaneous active TCP connections n as ri mln ■ 2' , i £ 
{0, . . . ,8}, Urnin being the minimum number of TCP connections to get a theoretical 
load of 100% on the link. Otherwise there is no overload, space and time priorities do not 
have effect, and the flow control is not active. Here n min = 2. The packet size S(P) is a 
common standard value of 500 Bytes including headers. Regarding the link parameters, 
with the link bandwidth being C; = 1.28 Mbit/s, we set the link propagation delay 
Dprop to 46.875 ms so that the theoretical round trip time RTT sums up to RTT = 
2- (■ niinks ■ Dprop + (nu n ks + 1) • D T x) = 2- (1 • 46.875ms + 2 • 3.125ms) = 100ms, 
where Dtx = -gr 2 is the transmission delay to send a packet and n^nfcs the number 
of links between routers A and B. 

The default value for the buffer size S Buffer is 160 packets so that a router is able 
to store packets for 0.5 seconds transmission. We use the parameters mentioned here 
as default parameters and write down the respective values in the following text only if 
they are set differently. Other parameters like algorithm specific settings are subject to 
the analysis and we indicate their values appropriately. 

Simulation. Figure 2 shows the normalized bandwidth ratio B ra u 0 (T SChi g h,T SCi ow ) 
for traffic mixes nhigh ■ ni ow of 1 : 3,1 : 1, and 3 : 1 with MEDF parameter Mi ow = 0.5s, 
i.e., one buffer size. The link bandwidth is the x-axis parameter. The value ri mm = 2 is 
omitted here and in the following figures as there is virtually no priority for the minimum 
number of users. The link capacity is fully shared between the single user of each class, 
thus, they both reach the maximum rate. This behavior - as expected - is sound for lack 
of competition on the link. 
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Fig. 2. MEDF prioritization independent of the Fig. 3. MEDF prioritization for two TSCs 
traffic mix 



The figure shows the traffic-mix-independent per-flow-prioritization property of 
MEDF. The small differences for low congestion are due to the slight influence of the 
buffer space. Few high priority flows can occupy relatively more buffer space per flow 
in contrast to many high priority flows under low congestion. However, the difference is 
negligibly small and the normalized bandwidth ratio converges very quickly. Opposed to 
that, conventional algorithms like WRR are insensitive to the traffic mix and therefore the 
normalized bandwidth ratio would decrease severely with the ratio of TSChigh traffic 
over TSCi ow . We further emphasize that this property is achieved by a single parameter 
per class and originates from the relative delay advantage controlled by MEDF. 

The traffic-mix-independent per-flow-prioritization property of MEDF was already 
shown in [7] on the packet level. Due to this property on the TCP flow level as well, we 
use the same number of saturated TCP sources for both TCSs in the following. 

For the minimum number of users n m i n = 2, there is virtually no prioritization for 
lack of competition. Prioritization of TSChigh traffic reaches its maximum at n = 4 
users (2 users per TSC) and degrades with a rising number of users. As we cannot 
simulate any value between two and four users - one and two users per TSC - we vary 
the bandwidth while keeping the number of users fixed at a value of 4 to derive the 
basal characteristics of the algorithm by having a more continuous range in Fig. 3. This 
demonstrates the behavior at various levels of slight congestion in real networks. 

At a bandwidth of 1.280 Mbit/s this experiment corresponds to a simulation with 
default values and 4 users, at abandwidth of 2.560 Mbit/s it is equivalent to 2 users. Higher 
offset values Mi ow lead directly to a higher prioritization of TSChigh packets. The 
throughput ratio rises with the bandwidth which is inversely proportional to the number 
of users. Low bandwidth (same holds for many users) limits the rate that connections 
for TSChigh can achieve dramatically. Besides, the actually measured round trip time 
increases and shortens the maximum obtainable rate. Thus, TSCi ow connections are 
able to grasp a higher relative share of the bandwidth. The bandwidth ratio rises until it 
reaches a maximum. Here, slowly sufficient capacity becomes available for both TSCs 
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and low priority packets can use more of the additional bandwidth. At 2.560 Mbit/s there 
is virtually no competition for bandwidth anymore. 

Another important aspect that can be seen here to understand MEDF is the interaction 
of Mi ow and the round trip time. The round trip time rises both with increasing traffic on 
the link and with decreasing bandwidth. Low bandwidth results in a longer transmission 
delay. The delay advantage becomes smaller relative to the round trip time and the 
prioritization decreases. 

The MEDF parameter Mi ow can be used to adjust the priority ratio for the anticipated 
level of competition for network resources. If sufficient resources are available, the 
MEDF algorithm does not influence normal network operation. For very scarce resources 
- here large numbers of users and low bandwidth, respectively - the network is under 
heavy overload and anticipatory action like admission control to block some of the 
connections must be taken to prevent such situations. Otherwise, only a very small 
portion of the overall bandwidth remains for each TSChigh flow anyway — no matter 
whether they receive preferential service or not. For low and medium overload, MEDF 
shows a very clear and easy adjustable behavior. 



MEDF Multi Link. We now extend our single link experiment to multiple links to 
assess the influence of MEDF on TSC priority if applied multiple times. 

Simulation environment. Figure 4 shows the simulation topology for the multi link 
experiment in the case of two links. If we simply add additional links and routers, the 
first router receives the packets from the TCP sources in an unordered way and applies 
the priority algorithm. Thus, the packets arrive at the router serving the next link one 
by one and the priority algorithm has no additional effect. To overcome this problem, 
we introduce cross traffic. Additional TCP sources connect to the interior routers and 
generate traffic that crosses the way of the measured traffic. 



CT TCP sources CT TCP sources 



TCP traffic r 
sources TSC^L 
TCP traffic f 
sources TSC l0W ^ 



TSC^ 



TSC lm 




destinations 

TSC^ 

destinations 

TSC lnw 



CT destinations CT destinations 



TSChigh TSC l0W 



Fig. 4. Multi link simulation topology 

It is important to send the cross traffic over the same number of links to account for 
comparable round trip times for the measured traffic and the cross traffic. Furthermore, 
the round trip time for both the single link and the multi link experiment should be 
the same. Otherwise, significant parameters that depend on the round trip time such 
as the maximum rate that can be achieved by a TCP connection are different and the 



Performance of TCP/IP with MEDF Scheduling 



101 



experiments are not comparable. Therefore, we calculate the new link propagation delay 
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ms. 



The TCP connections need the same bandwidth per flow on all links. If the bandwidth 
differs from link to link, the link with the lowest capacity becomes the bottleneck and 
dominates the observable effects. However, doubling the bandwidth of the links with 
cross traffic solves this problem. 



Simulation. Figure 5 shows the effect of MEDF over multiple links. We used the standard 
parameters with Mi ow = Is, i.e., twice the buffer size, and the default Full Buffer Sharing 
mechanism as buffer management. 

In general, the degree of prioritization of TSChigh increases with the number of links 
on the path, hence, with the number of applications of MEDF scheduling instances. How- 
ever, when the competition for network resources is low, the increase in priority is much 
more obvious. The reason behind this is similar to the situation for the single link ex- 
periment. The bandwidth theoretically available to a single connection is higher, hence, 
the actually measured round trip time is lower. Therefore, few TSChigh connections 
achieve higher rates in contrast to the situation when the network is highly overloaded. 
Rising competition for network resources makes the conditions for TSChigh more dis- 
advantageous. TSCi ow now obtains a larger share of the bandwidth. The priority does 
not increase linearly if additional links are added. The overall bandwidth ratio can be 
controlled by setting the MEDF parameter appropriately. 



MEDF and Space Priority. We now consider the MEDF characteristics with the usage 
of space priority mechanisms. Figure 6 shows the influence of the buffer sharing option. 
FIFO with FBS leads to an even division of available bandwidth between both TSCs as 
no packet preferences exist. FIFO with BSSP spreads the bandwidth equally as long as 
there is enough buffer space available (n < 2). Then it reaches its maximum when router 
buffers fill completely and slightly flattens under heavy traffic load. 





Fig. 5. MEDF prioritization in a multi link Fig. 6. MEDF and the impact of space priority 
topology 
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MEDF with parameter Mi ow = 0.5s and FBS clearly outperforms both FIFO ex- 
periments and exhibits the behavior characterized in the preceding sections. If we add 
BSSP, we observe a superposition of the MEDF curve and the curve for FIFO with BSSP. 
For few users we clearly identify the typical MEDF characteristics, for more users the 
router buffers fill completely and the space priority comes into play. Thus, space priority 
prohibits the typical decrease of the bandwidth ratio. 

3.2 MEDF in Comparison to Other Priority Mechanisms 

We used FIFO as the comparison baseline in the previous experiments. FIFO does not 
prioritize the traffic in time and therefore is one extreme of the spectrum of time priority 
mechanisms. Another extreme is Static Priority (SP). 

Static Priority (SP). Under network congestion, the time priority mechanism Static 
Priority leads to starvation of T SCi ow regardless of the buffer management in use. There 
are always TSChigh packets waiting in the router queues. SP dequeues those packets 
and even though the TSCi ow packets occupy most of the buffer space, their chance 
to leave the buffer is very low and, thus, the TCP timers for those connections expire. 
Accordingly, the TCP source tries to re-establish the connection but will suffer from 
starvation again. As a consequence, SP is completely inadequate for severely congested 
networks. In contrast to MEDF it does not consider a maximum delay for low priority 
traffic to prevent this effect. 

For a comparison to RED, a pure space priority algorithm, we refer to our technical 
report [10]. 

4 Conclusion 

In this work we examined the impact of the pure time priority (packet scheduling) 
mechanism Modified Earliest Deadline First (MEDF) in congested TCP/IP networks. 
Conventional algorithms like Weighted Round Robin (WRR), Deficit Round Robin 
(DRR) or Class Based Queuing (CBQ) assign fixed bandwidth shares among Transport 
Service Classes (TSCs) of different priorities. This is problematic with a varying number 
of users per TSC and saturated TCP sources. If the TSC of high priority ( T SChigh) holds 
more users than expected and the TSC of low priority ( TSCi ow ) holds fewer users, then 
the Quality of Service (QoS) for TSChigh can be worse then for TSCiow MEDF, how- 
ever, achieves a relative traffic-mix-independent per-flow-prioritization among all TSCs. 

In contrast to MEDF, Static Priority (SP) leads to starvation of low priority traffic 
while First In First Out (FIFO) effects no prioritization at all. MEDF achieves the desired 
priority ratio of the high priority TSC over the low priority TSC in realistic overload 
situations by its adjustable parameter Mi ow which reflects a relative delay advantage. 

Full Buffer Sharing (FBS) was the default buffer management scheme in our exper- 
iments. To estimate the influence of the buffer management algorithms, we combined 
MEDF with Buffer Sharing with Space Priority (BSSP). The results showed an increased 
prioritization of TSChigh- 

In conclusion, MEDF has powerful service differentiation capabilities and our perfor- 
mance study revealed that it is an attractive mechanism to achieve service differentiation 
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for TCP flows in congested networks. MEDF is especially interesting since it does not 
require per-class bandwidth configuration which might be problematic in the presence 
of unknown traffic mix. 

5 Outlook 

The practical adaptation of the relative delay parameter Mi ow , especially its dependence 
on the propagation delay, is an interesting field of further research. 
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Abstract. We present a new edge-to-edge management technique, 
called Ping Trunking, that can provide soft service guarantees to ag- 
gregated traffic. A Ping trunk is an aggregate traffic stream that flows 
between two nodes in a network at a rate dynamically determined by 
a Vegas-like congestion control mechanism. A management connection 
associated to each trunk is in charge of regulating the user data trans- 
mission rate. Due to this managing, trunks are able to probe for available 
bandwidth adapting themselves to changing network conditions in accor- 
dance with their subscribed target rates. We demonstrate analytically 
and through simulation experiments the effectiveness of our proposal. 

Keywords: Traffic management, aggregated traffic, TCP Vegas. 



1 Introduction 

Only two simple key principles, namely the best effort paradigm for data 
transport and the end-to-end philosophy for traffic control and management, 
have prevented the current Internet to collapse while it underwent an expo- 
nential growth. However, the best effort model with no service guarantee is no 
longer acceptable in view of the proliferation of interactive applications such 
as Internet telephony or video conferencing. Along the past years, the IETF 
has standardized several frameworks to meet the demand for Quality of 
Service (QoS) support as, for instance, RSVP [1] in the control plane or 
Differentiated Services [2] in the architecture area. 

For scalability reasons, it is impractical to enforce performance guarantees at 
a fine-grained level (e.g. per flow) and so the QoS requirements will most likely 
be applied to aggregate traffic streams rather than to individual flows. An ag- 
gregate traffic stream bundles a number of flows for common treatment between 
two nodes in a network. Such form of aggregation clearly simplifies the allocation 
of network resources and promotes the deployment of QoS frameworks notably. 

* This work was supported by the project TIC2000-1126 of the “Plan National de 
Investigation Cientifica, Desarrollo e Innovation Tecnologica” and by the “Secretaria 
Xeral de I+D da Xunta de Galicia” through the grant PGIDT01PX132202PN. 
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For similar reasons, service providers are likely to offer performance commit- 
ments at the aggregate level, either to end users or to other peer providers. 
Within this context, and irrespective of the end-to-end transport-level protocols 
used by individual flows, it is essential that aggregated traffic is responsive to 
network congestion at an aggregate level. Several approaches proposed to apply 
congestion control to aggregates such as Congestion Manager [3] and Coordina- 
tion Protocol [4] architectures require the modification of user applications at 
the endpoints and, therefore, they are not suitable to be used in the Internet 
backbone. In contrast, TCP Trunking [5,6] is an interesting method for providing 
the management of aggregate traffic streams without changing neither user pro- 
tocols, nor applications. This technique employs some control TCP connections 
to regulate the flow of each aggregate traffic stream into the network. 

In this paper we introduce Ping Trunking , an enhancement of the 
TCP Trunking technique that improves the original one by changing the control 
overlay. In particular, instead of using several TCP connections to manage each 
aggregate, a new preventive Vegas-like connection is used. Ping Trunking has 
several advantages over TCP Trunking. Foremost, it does not cause sharp varia- 
tions in the transmission rate of the trunks and reduces the size of the queues at 
the core of the network, due to the dynamics of its Vegas-like congestion response. 
Additionally, it introduces far less overhead and makes trunks operation simpler. 

The remainder of this paper is organized as follows. In Sect. 2, we give a brief 
overview of TCP Trunking, the traffic management technique which Ping Trunk- 
ing is based on. Section 3 describes the Ping Trunking mechanism. In Sect. 4, we 
present a simple analysis of its performance. Section 5 contains some ns-2 simu- 
lation experiments to validate the theoretical analysis. TCP and Ping Trunking 
techniques are compared in Sect. 6. We end the paper with some concluding 
remarks and future lines in Sect. 7. 

2 TCP Trunking 

A TCP trunk [5,6] is an aggregate traffic stream where data packets are trans- 
mitted at a rate dynamically determined by the TCP congestion control al- 
gorithm [7]. Each trunk carries a varying number of user flows between two 
nodes of the network. The flow of the aggregated traffic is regulated by a control 
TCP connection established between the two edges of the trunk. This control 
connection injects control TCP packets into the network to probe its congestion 
level (Fig. 1). TCP Trunking fully decouples control from data transmission: the 
introduction of control packets is not conditioned by the user data protocols, 
but it is based on control packet drops as in usual TCP connections. In addi- 
tion, trunks will not retransmit user packets if they are lost. If it is required, 
retransmissions should be handled by the user applications on the end hosts. 

TCP Trunking is implemented in the following manner. User packets ar- 
riving at the sender edge are temporarily queued into the buffer of the trunk. 
After a control packet is transmitted, user packets can be forwarded, totalling 
at most vmss (virtual maximum segment size) bytes. When vmss user bytes 
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Fig. 1 . TCP Trunking basics 



have been transmitted, the control connection will generate and send a control 
packet if its control TCP congestion window allows it. This way, control packets 
regulate the transmission of user packets and then, the loss of control packets 
in the network not only slows down the transmission rate of control packets but 
also reduces the transmission rate of user data. 

Multiple control TCP connections are employed with each trunk. If each 
trunk were regulated by a single control connection, a control packet loss would 
cause the entire trunk to halve its sending rate, as dictated by the TCP dynam- 
ics. When dealing with traffic aggregates, this abrupt reduction in transmission 
rate on packet losses is undesirable. In [6], the authors argue that four control 
connections per trunk are enough to produce smooth bandwidth transitions. 

To conclude with this overview, it is important to point out that both user 
and control packets must follow the same path between the edges of the trunk to 
ensure that control connections are probing the proper available bandwidth. This 
assumption can be absolutely guaranteed if trunks are run on top of ATM virtual 
circuits or MPLS label-switched paths [8]. 

3 Ping Trunking 

Our proposal, named Ping Trunking , borrows from TCP Trunking the concept 
of decoupling control from data in IP networks, but the original technique has 
been improved by changing the control overlay. In Ping Trunking , only a single 
control connection is established between the two network edges of each trunk. 
This connection controls the flow of user packets into the core of the network 
using a Vegas-like congestion control mechanism that adapts its congestion win- 
dow ( cwnd ) based on the observed changes in the round-trip time (RTT), and 
not only on the loss of control packets. Figure 2 provides greater detail on the 
operation of this mechanism. Incoming user packets at the sender edge are classi- 
fied as belonging to a particular trunk and queued in the corresponding buffer. 1 
User packets can only be forwarded when credit for their trunk is available. The 
credit value represents the amount of user data allowed to be forwarded. When 

1 The identification method for classifying packets from different aggregates could be 
based on the value of several header fields, such as source or destination addresses, 
source or destination port numbers, ATM virtual circuit identifier or MPLS label. 
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Fig. 2. Ping Trunking diagram 



an user packet is sent, the credit is decremented by the size of the packet. When 
a control packet is sent, the credit is incremented by cwnd bytes. Therefore, the 
transmission of user data is regulated by both the forwarding of control packets 
and the cwnd value. 



3.1 Control Connection 

Control TCP connections have been substituted by a new control connection 
whose main task consists of measuring the RTT between the edges of the trunk 
accurately. The designed Vegas-like congestion control mechanism will employ 
this RTT estimate when adapting the transmission rates of the trunks. Let us 
start by giving a brief description of the operations accomplished by the control 
connection. When the first user packet arrives to the trunk, a control packet 
is generated and sent. For each control packet that reaches the receiver edge 
of the trunk, its corresponding acknowledgment (ack) is generated. The arrival 
of an ack back at the sender edge triggers the transmission of a new control 
packet. Therefore, the control connection only sends control packets on reception 
of acks. To avoid the starvation of control connections, a waiting time-out timer 
is needed. This timer is started every time a control packet is sent. If the ack of 
the last control packet sent does not arrive to the sender edge before the timer 
expires, the control connection will consider that the packet has been lost and 
it will send a new control packet . 2 

Control connections use a method similar to the one used in the TCP esti- 
mation of RTT. To carry out this task, they timestamp with its local time every 
control packet and this timestamp is echoed in the acks. The value of the last 
RTT sample observed is computed as the difference between the current time 
and the timestamp field in the ack packet. The RTT is eventually estimated 
using an exponential moving average taken over RTT samples. 

2 The control connection proposed is analogous to the ping command used to send 
ICMP ECHO_REQUEST/REPLY packets to network hosts. This explains why we 
refer to our mechanism as Ping Trunking. 
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3.2 Vegas-Like Congestion Control Mechanism 

The transmission rate of each trunk should be able to update dynamically ac- 
cording to the current network conditions. We propose the use of a Vegas- like 
congestion control mechanism to discover the available bandwidth that each 
trunk should obtain. TCP Vegas [9] is an implementation of TCP that em- 
ploys proactive techniques to increase throughput and decrease packet losses. 
The congestion control mechanism introduced by Vegas gives TCP the ability to 
detect incipient congestion before losses are likely to occur. We devise a similar 
mechanism adapted to trunks. 

Upon receiving each ack, control connections calculate the expected through- 
put and the actual throughput as in TCP Vegas. If it is assumed that active 
trunks are not overflowing the path, the expected throughput can be calculated 
as cwnd/d , where d is the round-trip propagation delay and can be estimated as 
the minimum of all measured RTTs. On the other hand, the actual throughput is 
given by cumd/D , where D is the RTT estimation. These throughputs are com- 
pared and then, control connections adjust their congestion windows accordingly. 
Let Diff be the difference between the expected and the actual throughput: 

Diff = Expected — Actual = ( — - j d . (1) 

The Diff value has been scaled with the minimum RTT so that Diff can be seen 
as the amount of user data in transit. 

There are two thresholds defined: a, (3, with a < (3. When Diff < a, a trunk 
is allowed to increment its amount of user data in transit, and therefore, the as- 
signed control connection increases its congestion window linearly. If Diff > (3, 
the control connection is forced to decrease its congestion window linearly. In 
any other case, the congestion window remains unchanged. This mechanism sta- 
bilizes the value of the congestion window and reduces packet drops. If a control 
packet loss is detected, the congestion window will be halved, but this should 
happen sporadically. 



3.3 Avoiding Burstiness 



Our mechanism as described so far may introduce long bursts of data packets 
into the network of length cwnd bytes. Nevertheless, we can easily avoid this 
undesired effect fixing a maximum burst length (max -burst -length) after which 
trunks have to wait burst-delay seconds before sending the next packet. This 
value is computed in the following manner: 



burst-delay = D ■ 



max -burst -length 
cwnd 



(2) 



4 Performance Analysis 

Consider a network shared by a set T of trunks. Let C denote the network 
capacity. Each trunk i £ T has associated a subscribed target rate r*. The 
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overall aggregated demand R is the sum of the subscribed target rates for all ac- 
tive trunks. If R < C, the excess unsubscribed bandwidth should be distributed 
among trunks in proportion to the contracted target rates. Therefore, each trunk 
should receive ideally the following share of bandwidth: 

ri + (C - R)jf = -^r where R = ^ r, ; . (3) 

i 

According to one interpretation of Vegas [10], congestion windows of control 
connections must satisfy the following equation in the equilibrium (we assume 
for simplicity that a* = /3,;): 



cumdi 

di 



cumdi 

Di 




( 4 ) 



On the other hand, the transmission rate x of each trunk is determined by the 
cwnd value of its corresponding control connection ( cwnd = xD) . Substituting 
this in (4), we have 



f XiDi XiDA 

V d i A: ) 



( 5 ) 



and, from (5), it follows that 



X ' = D^d t ' (6) 

The RTT can be calculated as the sum of two delays: the round-trip prop- 
agation delay (d) and the queueing delay ( B/C ), where B denotes the total 
backlog buffered in the network. Then, from (6), and using Di = di + B/C, 
the transmission rate of each trunk can be expressed as 



OLiC 

B 



( 7 ) 



Finally, equating (3) and (7) yields the suitable value of a threshold that 
permits to allocate to each trunk its desired share of bandwidth: 



iB 

R 



Therefore, to compute the a threshold, each trunk must know both B and 
R parameters. The B parameter should have a fixed low value set by the net- 
work manager to encounter small queues at the core. However, the necessity of 
determining the overall aggregated demand in all edge nodes may complicate 
our proposal substantially. 

Fortunately, we can demonstrate that it is not required to know the value of 
the overall demand very accurately. Assume R^ R was used as the aggregated 
demand. The total backlog B actually buffered in the network is obtained as the 
sum of a thresholds of all competing trunks. Then, 



B = ^2ai = Y^ 

i i 



VjB 

R 



B 

R 




X 



BR 

R 



(9) 
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From (7), and using BR = BR derived from (9), we can conclude that the 
fairness condition is still satisfied although the R value employed is false: 



a.iC tiBC riC 

B ~ RB ~ R 



(10) 



5 Performance Evaluation 

We have implemented Ping Trunking in the ns-2 simulator [11]. The following 
experiments have been conducted to validate the analysis performed in the pre- 
vious section. Figure 3 shows the network topology employed. It consists of three 
edge nodes and one core node belonging to a particular domain. Each edge node 
is connected to a client or traffic source. We consider two competing aggregates: 
aggregate 1 running from client 1 to client 3 and aggregate 2 running between 
clients 2 and 3. Therefore, both aggregates pass through a single bottleneck 
(link C-E3). Both aggregates comprise 20 TCP flows. All TCP connections es- 
tablished are modeled as eager FTP flows that always have data to send and 
last for the entire simulation time. We use the TCP New Reno implementation 
and the size of user packets is set to 1 000 bytes. 




Fig. 3. Network topology. Every link has a 10 Mbps capacity and a propagation delay 
of 10 ms. All queues are FIFO buffers and are limited to 50 packets 



A Ping trunk is used to manage each aggregate traffic stream: trunk 1 man- 
ages aggregate 1 and trunk 2 manages aggregate 2. The sender and receiver of 
trunk 1 are El and E3, respectively, while E2 and E3 are the sender and receiver 
of trunk 2, respectively. Control connections send 44-byte packets. 3 Each trunk 
buffer is a simple FIFO queue with capacity for 50 packets. A total backlog of 
10 000 bytes (10 packets) is chosen to be buffered in the network. The maximum 
burst length is set to 5 000 bytes (5 packets). 

We run the simulations for 50 seconds and measure the average throughput 
achieved by each aggregate over the whole simulation period. Each simulation 
experiment is repeated 10 times changing slightly the initial transmission time 
of each TCP flow and then, an average is taken over all runs. 

3 Control connections do not actually transmit any real data, so control packets only 
consist of the TCP/IP header with no payload (40 bytes plus 4 additional bytes 
required to estimate RTT). 
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Agg 2 Contracted Rate (Mbps) Overall Aggregated Demand (Mbps) 



(cl) Varying target rates (Agg 1 contracted (b) Varying the overall demand value em- 
rate — 1 Mbps) ployed (Actual R = 2 Mbps) 



Fig. 4. Performance evaluation results 



In the first experiment, aggregate 1 contracts a fixed throughput of 1Mbps 
while the subscribed target rate of aggregate 2 varies from 1 to 5 Mbps. Fig- 
ure 4(a) shows both the obtained and the expected results. As required, the 
network bandwidth is fairly distributed between the two competing aggregates 
according to their contracted rates. In the second experiment, both aggregates 
contract a throughput of 1Mbps setting, therefore, the overall aggregated de- 
mand R to a fixed value of 2 Mbps. However, we force trunks to employ a different 
value when calculating their a thresholds. Thus, the R value employed varies 
from 1 to 5 Mbps. Figure 4(b) depicts the throughput obtained by each trunk. 
Simulation results confirm theoretical analysis: our technique still works despite 
of using an incorrect value of the overall aggregated demand. 



6 Comparison with TCP Trunking 

In this section, we compare TCP and Ping Trunking techniques. Foremost, 
Ping Trunking facilitates the operation of trunks: while TCP Trunking assigns 
four control TCP connections to each aggregate to produce a smoother behav- 
ior, each Ping trunk is regulated by a unique control connection. Therefore, our 
technique is clearly simpler, easier to understand and it operates more efficiently. 

In addition, despite of managing each aggregate with a single control connec- 
tion, trunks do not suffer from sharp variations on their sending rate. The applied 
Vegas-like congestion control mechanism reduces control packet drops and sta- 
bilizes the transmission rate while maintaining the ability to adapt to changing 
conditions. To confirm this interesting feature, we have conducted an extra sim- 
ulation experiment using the same network topology with the same parameters 
as in the previous section. In this experiment, we compare the variations in the 
transmission rate of TCP and Ping trunks. Firstly, we use TCP trunks to regu- 
late the two competing aggregates. In this case, each aggregate will be managed 
by four control TCP connections. Then, we use Ping trunks to manage them. The 
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(a) TCP Trunking (b) Ping Trunking 



Fig. 5. Transmission rate comparison 
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Fig. 6. Core queue size comparison 




obtained results are shown in Fig. 5. 4 It can be seen how Ping Trunking in effect 
causes smoother variations in the sending rate of the trunks than TCP Trunking. 
We have also compared the core queue size for the two techniques. As shown in 
Fig. 6, the core queue size is greatly decreased with Ping Trunking. Moreover, 
with our proposal, no packet loss takes place at the core node. This result can be 
explained by the different congestion control mechanisms used in each scheme. 
While the Reno algorithm induces losses to learn the available network capacity, 
Vegas sources adjust their rates so as to maintain a packets buffered in their 
paths. Additionally, a smaller queue size helps to reduce both the latency and 
the jitter suffered by the packets. 

The last issue to take into account is that both techniques add some overhead 
due to the introduction of control packets. TCP Trunking technique injects a 
control packet into the network each vmss data bytes. In the experiments con- 
ducted to study TCP Trunking performance [6], the vmss value has been set 
to 1.5 or 3.0kBytes to obtain good results. In Ping Trunking , a control packet 
is injected into the network each cwnd data bytes. This value is not fixed and 

4 We performed exhaustive simulations to verify that the examples shown are repre- 
sentative. 
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depends on both the trunk rate and the RTT, but, as can be derived from 
the previous experiments, the cwnd is considerably greater than the vmss used 
in TCP trunks for common networks and contracts (5-45 kBytes). Therefore, 
Ping trunks usually add far less overhead than TCP trunks. 

7 Conclusions and Future Work 

We have presented Ping Trunking , an enhanced modification of the TCP Trunk- 
ing technique, able to share the available bandwidth among competing aggre- 
gates in proportion to their subscribed target rates. The new control overlay 
proposed permits to manage each aggregate with a unique control connection 
that does not introduce large variations in the transmission rate of the handled 
aggregates. In addition, the size of the shared queues at the core of the network 
is reduced significantly helping to support lower latency and jitter times. Lastly, 
the overhead added due to the introduction of control packets is also greatly 
decreased. Thus, the bandwidth used by control packets can be considered prac- 
tically negligible. 

There are some issues requiring further investigation. Firstly, we should con- 
sider the stability problems shown in Vegas as delay increases [12]. This could be 
specially serious when we use Ping Trunking in the core of networks with large 
delays. Also, our current technique is applied to one-to-one network topologies. 
In future work, we plan to extend this technique to one-to-many topologies. 
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Abstract. The differentiate services (DiffServ) is proposed to provide 
packet level service differentiations in a scalable manner. To provide end- 
to-end service differentiation to users having a connection over multiple 
domains, an intermediate marker is necessary at the edge routers. The 
intermediate marker has a fairness problem among the TCP flows due 
to the TCP’s congestion control. In this paper, we propose an aggregate 
fairness marker (AFM) as an intermediate marker which works with a 
user flow three color marker (uf-TCM) operating as a flow marker for 
a user flow. Through the simulations, we show that the proposed AFM 
can improve fairness to the TCP flows with different RTTs. 



1 Introduction 

To support the quality of service (QoS) [1] on IP based networks, the Internet 
Engineering Task Force (IETF) has proposed the Differentiated Services (Diff- 
Serv) [2], DiffServ provides simple and predefined per-lrop behavior (PHB) level 
service differentiation. The IETF has defined one class for Expedited Forwarding 
(EF) PHB and four classes for Assured Forwarding (AF) PHB [3] [4]. AF PHB 
allows an Internet service provider (ISP) to provide different levels of forwarding 
assurances according to the user profile. 

Recent works on DiffServ deal with an aggregate flow at edge routers [5]. 
The random early demotion promotion (REDP) was proposed as an aggregate 
flow management [6]. We think that the main contribution of REDP is the 
introduction of the packet demotion and promotion concept in DiffServ networks. 
REDP achieves good UDP fairness in demotion and promotion. However, it fails 
to give good fairness to the TCP flows with different RTTs. In the case of the 
TCP flows, their transmission rates highly depend on the round trip time (RTT) 
due to the TCP’s congestion control algorithm, which brings about the unfairness 
of throughput among the TCP flows with different RTTs. Note that it is very 
difficult to resolve this problem at the aggregate flow level without the individual 
state information of each TCP flow such as RTT and the sending rate. 
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In this paper, to resolve the TCP fairness problem of an intermediate marker 
such as REDP, we propose an aggregate fairness marker (AFM) as an intermedi- 
ate marker. Also, it works with a user flow three color marker (uf-TCM), which 
is proposed as a flow marker for a user flow. The fundamental assumption of 
the proposed marker is that the relative transmission rates among TCP flows 
do not remain constant so that at certain times the TCP flow with a longer 
RTT sends relatively more packets than the TCP flow with a shorter RTT. In 
that case, if we decrease the demotion probability and increase the promotion 
probability at those times, the TCP flow with a longer RTT gets more advan- 
tages from demotion and promotion than the TCP flow with a shorter RTT. The 
problem, then, is how to determine the demotion and promotion probabilities 
for improving TCP fairness using only the aggregate flow state information. We 
measure the green, yellow, red and total rates at the edge router, and infer the 
individual TCP flow states from the measurement results. Using the inferred in- 
dividual TCP flow state information, the demotion and promotion probabilities 
are determined. Through the simulations, we show that the AFM can improve 
TCP fairness as well as link utilization without per-flow management in multiple 
domain environments. 



2 A Proposed Scheme 

Our proposed scheme is composed of a uf-TCM and an AFM operating inde- 
pendently as shown in Fig. 1. A flow marker is necessary for monitoring the 
user profile and it initially marks the packets from a user flow according to its 
profile. The proposed uf-TCM is a flow marker that monitors each user profile 
and marks the packets from the flow as green, yellow, or red. The proposed AFM 
is an intermediate marker that monitors the aggregate traffic and according to 
incoming traffic situation, it fairly performs packet demotion and promotion. 



2.1 User Flow Three Color Marker (uf-TCM) 

For assured services, in general, the packets of each user flow are initially marked 
as either green or red according to their profile obedience. A simple method for 
monitoring the traffic profile obedience of the flow is to use a token bucket 




User Access Edge Router • Interdomain Edge Router 

6P> : Host | m | ; uf-TCM AFM : Aggregate Fairness Marker 



Fig. 1 . The structure of the proposed scheme at Edge Router 
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Fig. 2. The proposed uf-TCM algorithm 



marker. TCP flow sends packets in a burst within an RTT interval; thus, there 
are some time intervals during which no traffic is generated from a user source. 
During these time intervals, token loss is likely to occur from the token bucket; 
thus, it is hard to guarantee minimum throughput to a user [7]. In order to im- 
prove this situation, we use a three color marking process instead of the normal 
two color marking process at each user flow marker. For this purpose, we pro- 
pose a user flow three color marker (uf-TCM) based on a simple token bucket 
algorithm. The operation of the proposed uf-TCM is shown in Fig. 2. A packet is 
marked as green if there are enough tokens in the token bucket. It is marked as 
yellow if there are not enough tokens in the token bucket but there are enough 
loss tokens. Otherwise it is marked as red. Note that if a packet is marked as 
yellow in this way, the throughput of green packets of a flow cannot grow beyond 
the contracted rate even when yellow packets are promoted to green packets in 
the following domains. This is because yellow packets can consume only the loss 
tokens. That is, the uf-TCM ensures that the packets from a flow never disobey 
its traffic profile which is agreed upon between a user and an ISP. However, the 
throughput of a TCP flow can be improved because yellow packets have chance 
to be promoted to green in the following domains. In our scheme, we assume 
that yellow packets have the same drop precedence as red packets at the core 
routers. Thus, we can use the RIO buffer management scheme [8]. 

2.2 Aggregate Fairness Marker (AFM) 

AFM measures the green, yellow, red and total rates using the rate estimator. 
The marking decider infers the individual TCP flow states from the measure- 
ment results and determines the demotion and promotion probabilities using the 
inferred individual TCP flow state information and the token level of the token 
bucket (i.e., the number of the remaining tokens in the token bucket). Then, the 
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Packet classifier 



Forwarding engine 



Fig. 3. The functional structure of AFM 



marking decider determines whether green and yellow packets are remarked to 
yellow and green packets, respectively. Figure 3 shows the functional structure 
of AFM. 

AFM has two operation modes. If the rate of green packets is beyond the 
aggregate contract rate or negotiated rate between two domains, it enters the 
demotion mode in which the token bucket is divided into the demotion region 
and the balance region, as shown in Fig. 4. And, whenever the token level of 
bucket is within the demotion region, AFM fairly demotes green packets to 
yellow to prevent the phase effect. On the other hand, if the green rate is under 
the contract rate, it enters the promotion mode in which the token bucket is 
divided into the balance region and the promotion region. And, whenever the 
token level is within the promotion region, it fairly promotes yellow packets 
to green to increase link utilization. The balanced region is needed to prevent 
unnecessary packet demotion and promotion [6]. 
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Fig. 4. The operation modes of AFM mode 



As previously stated, due to the TCP’s congestion control algorithm, the 
TCP flow with a shorter RTT generates more green and yellow packets than the 
TCP flow with a longer RTT, which brings about the unfairness of throughput 
among the TCP flows with different RTTs as well as lowering link utilization. To 
resolve this problem, AFM measures the green, yellow, red and total rates at an 
edge router, infers the individual TCP flow states from the measurement results 
and determines the demotion and promotion probabilities using the inferred 
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individual TCP flow state information. This is done to improve TCP fairness 
using only the aggregate flow state information. In the following, we explain the 
reason why this works well. 

We first consider the over-provisioning case, i.e., the sum of the user contract 
rates is lower than the aggregate contract rate. In this case, a promotion situation 
is dominant. Figure 5(a) shows the ratios of the incoming red traffic rate to 
the total incoming traffic rate and the incoming yellow traffic rate to the total 
incoming traffic rate under an over-provisioning situation at an edge router. It is 
well known that the throughput of TCP oscillates due to its congestion control 
algorithm. Thus, the ratio of the incoming red traffic rate to the total incoming 
traffic rate at an edge router also oscillates. If the ratio is close to the maximum 
value ’A’ as shown in Fig. 5(a), we can infer that the sending rates of the TCP 
flows with a shorter RTT are also close to the maximum value ’A’, as shown in 
Fig. 5(b). That is, the TCP flows with a shorter RTT determine the dynamics 
of the aggregate flow. Therefore, if we increase the promotion probability in this 
case, it may bring about unfairness because of the increased probability that 
the TCP flows with a shorter RTT will receive more promoted packets than the 
TCP flows with a longer RTT. Therefore, it is desirable to lower the promotion 
probability in this case. 





Simulation Time (sec) Simulation Time (sec) 

a) The traffic ratio b) The sending rate 



Fig. 5. The traffic ratio in over-provisioning case and TCP behavior 



On the other hand, if the ratio is close to the minimum value ’B’ as shown in 
Fig. 5(a), we can infer that the sending rates of the TCP flows with a shorter RTT 
are also close to the minimum value ’B’ as shown in Fig. 5(b). This is because 
when the sending rates of the TCP flows with a shorter RTT reach maximum 
values, the packet dropping probability of the TCP flows with a shorter RTT 
becomes much higher than that of the TCP flows with a longer RTT, and thus 
the sending rates of the TCP flows with a shorter RTT become lower. But, by 
this reduction of the sending rates, the available bandwidth for the TCP flows 
with a longer RTT increases, so that their sending rates become higher until the 
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time corresponding to the minimum value ’B\ After that, the ratio increases 
again since the TCP flows with a shorter RTT begin recovering their previous 
transmission window. From the above observation, if we increase the promotion 
probability when TCP flows with a longer RTT send relatively more packets 
than TCP flows with a shorter RTT, a TCP flow with a longer RTT gets more 
advantages from promotion than a TCP flow with a shorter RTT, which means 
that TCP fairness is improved. Similarly, we decrease the promotion probability 
when the TCP flow with a shorter RTT sends relatively more packets than 
the TCP flow with a longer RTT. Therefore, we can improve TCP fairness by 
using the ratio of the incoming red traffic rate to the total incoming traffic 
rate without per-flow management. For the over-provisioning case, the packet 
promotion probability Ppromois determined as follows: 

number _of -tokens — threshold 
bucket-depth — threshold 

We next consider the under-provisioning case, i.e., the sum of the user 
contract rates is higher than the aggregate contract rate. Even in the under- 
provisioning case, due to the TCP congestion control algorithm, the incom- 
ing green traffic rate is hardly beyond the aggregate contract rate at an edge 
router. Therefore, a promotion situation is dominant. If the network is under- 
provisioned, the packets from a TCP flow cannot consume all of the loss to- 
kens in most cases and thus TCP flows can hardly generate any red traffic, 
as shown in Fig. 6(a). However, the TCP flows with a shorter RTT generate 
more yellow traffic than the TCP flows with a longer RTT. Therefore, unlike the 
over-provisioning case, the ratio of the incoming yellow traffic rate to the total 
incoming traffic rate should be used for determining the promotion probability 
when the network is under-provisioned. Similar to the over-provisioning case, 
it is desirable to lower the promotion probability when the ratio of the yellow 
traffic rate to the total traffic rate becomes higher. For the under-provisioning 
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Fig. 6. The traffic ration in under-provisioning case and TCP behavior 
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case, the packet promotion probability P pr0 mo is determined as follows: 

number _of -tokens — threshold 
bucket-depth — threshold 

Finally, we consider the case that the incoming green traffic rate is beyond 
the negotiated contract rate at interdomain edge routers. In this case, green 
packets should be demoted to yellow. In the demotion situation, we don’t need 
to separately consider the over-provisioning case and the under-provisioning case 
because the ratio of the incoming green traffic rate to the aggregate contract rate 
at an interdomain edge router also oscillates, according to input traffic condition. 
In a demotion mode, the higher the incoming rate of green packet is, the more 
the demotion probability, Pdemoi must be increased. For both provisioning cases, 
the packet demotion probability Pdem<js determined as follows: 

contract Jr ate\ ( number joj -tokens — threshold 

green-rate J \ bucket-depth — threshold 

Note that P pr0 mo and Pdemo are determined by using not only the traffic 
ratios but also the token level of the token bucket. This is because the token level 
reflects the difference between the incoming green traffic rate and the contract 
rate. For example, if there are enough tokens in the bucket, by making the 
promotion probability higher, we can increase link utilization in a promotion 
situation, On the other hand, by making the demotion probability lower, we can 
prevent lowering of link utilization in a demotion situation. 

AFM algorithm 

if contract_rate => green_rate 
enter Promotion mode 
if threshold => number_of _tokens 
enter Balanced region, 

P_promo = 0 

elseif threshold < number_of _tokens 
enter Promotion region 
if red_rate/total_rate > 0.01 
decision over_provisioning 
R_promo = red_rate/total_rate 
elseif red_rate/total_rate <= 0.01 
R_promo = yellow_rate/total_rate 
P_promo = 1 - R_promo* (1- (number_of _tokens - threshold) 

/ (bucket_depth - threshold)) 

elseif contract_rate < green_rate 
enter Demotion mode 

if threshold => token_depth - number_of _tokens 
enter Balanced region, 

P_demo = 0 

elseif threshold < number_of _tokens 




Pdemo — ( 1 
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enter Demotion region 
R_demo = 1 - contract_rate/green_rate 
P_demo = R_demo* (1- (number_of .tokens) 
/ (bucket_depth - threshold)) 



In the proposed AFM algorithm, The operation mode is determined by a 
comparison between the contract rate and the rate of green packets. When the 
ratio of the incoming red traffic rate to the total incoming traffic rate at an edge 
router is under ’0.01’ in a promotion situation, as shown In the proposed AFM 
algorithm, we assume that it is an under-provisioning case. This is because there 
are some possibilities that TCP flows with a shorter RTT may send some red 
packets in spite of the under-provisioning situation. The AFM uses a time sliding 
window (TSW) [8] to estimate the incoming traffic rates at edge router. 

3 Performance Study 

In this section, we analyze the performance of the proposed scheme, in compar- 
ison with the REDP combined with a two color token bucket marker acting as 
a flow marker, and the REDP combined with the proposed uf-TCM. We imple- 
mented a simple RIO queue [8] with parameters ( qmin/qmax/Pmax ) = 45/60/0.02 
for in packets and 20/40/0.2 for out packets in ns2 simulator [9]. We set the 
bucket depth as one packet for a flow marker and as 60 packets for an intermedi- 
ate marker. In the case of REDP, Tl is set to 15 packets, Th is set to 45 packets, 
MAXdemo is set to 0.5, and M AX promo is set to 0.5 [6]. In the case of AFM, we 
set appropriately the range of the balanced region as 10 packets by the number 
of simulations 



3.1 Multiple Domain Scinario 

Figure 7 shows the simulation model for the multiple domain case. There are 10 
TCP hosts and each host contracts 400Kbps. 

The RTT of each flow is ranged from 30ms to 84ms with 6ms differences 
and each flow connection spans through domain 1, 2. The contract rate of the 
aggregate flow is 4Mbps in domain 1. So, there is exact-provisioning in domain 
1. In domain 2, the interdomian negotiated contract rate is a Mbps as shown in 
Fig. 7 and we set a as 6Mbps for the over-provisioning (150% provisioning) case, 
as 4Mbps for the exact-provisioning (100% provisioning) case, as 3Mbps for the 
under-provisioning (75% provisioning) case, respectively. 

Figures 8 show the average total throughput of each host for the three cases. 
Our proposed scheme shows the best results for all cases. The REDP combined 
with the two color token bucket marker corresponds to the original REDP, can- 
not improve TCP fairness. However, in the case of the REDP combined with 
the proposed uf-TCM, compared with the above original REDP scheme, as well 
as link utilization, TCP fairness is improved to some degree because the yellow 
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Fig. 7. The simulation model for the multiple domain case 




a) Cver-provi sior.ing case b) Exact-prcvisiorir.g case 




C) Under provisioning case 

Fig. 8. The average total throughput of each host for the three cases 



packet is generated from each flow source by the uf-TCM and has a chance to be 
promoted to a green packet at the edge router. This means that the throughput 
of each flow can be improved. However, compared with our proposed scheme, 
the performance improvement is restricted, due to the simple probability deci- 
sion of REDP marker that depends only on the token level. For the multiple 
domain simulation model, our simulation results show that the fairness indexes 
[10] of REDP/REDP (uf-TCM) /AFM(uf-TCM) are 0.874/ 0.897/ 0.988 for the 
over-provisioning case, 0.848/ 0.943/ 0.983 for the exact-provisioning case, and 
0.928/0.892/0.976 for the under-provisioning case, respectively. 
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4 Conclusion 

To provide end-to-end service differentiation of assured services in DiffServ, as 
well as a flow marker that initially marks user flow, an intermediate marker is 
necessary. To the best of our knowledge, there have been no works to resolve 
the TCP fairness problem of intermediate markers because it is very difficult to 
resolve this problem without individual TCP flow information. In this paper, to 
resolve this TCP fairness problem of an intermediate marker, we have proposed 
an aggregate fairness marker (AFM) as an intermediate marker which works 
with a user flow three color marker (uf-TCM) operating as a flow marker for a 
user flow. The proposed AFM improves the fairness among the TCP flows with 
different RTTs without per-flow management in multiple domains. 
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Abstract. In this paper, we propose an AIMD-based TCP load 
balancing architecture in a backbone network where TCP flows are 
split between two explicitly routed paths, namely the primary and the 
secondary paths. We propose that primary paths have strict priority 
over the secondary paths with respect to packet forwarding and both 
paths are rate-controlled using ECN marking in the core and AIMD 
rate adjustment at the ingress nodes. We call this technique “prioritized 
AIMD”. The buffers maintained at the ingress nodes for the two 
alternative paths help us predict the delay difference between the two 
paths which forms the basis for deciding on which path to forward 
a new-coming flow. We provide a simulation study for a large mesh 
network to demonstrate the efficiency of the proposed approach in terms 
of the average per-flow goodput and byte blocking rates. 

Keywords: Traffic engineering; load balancing; multi-path routing; 

TCP. 

1 Introduction 

IP Traffic Engineering (TE) controls how traffic flows through an IP network 
in order to optimize the resource utilisation and network performance [4]. In 
multi-path routing-based TE, multiple explicitly routed paths with possibly dis- 
joint links and nodes are established between the two end points of a network 
in order to optimize the resource utilisation by intelligent traffic splitting. These 
explicitly routed paths are readily implementable using standard-based layer 2 
technologies like ATM or MPLS or using source routed IP tunnels. The work 
in [5] proposes a dynamic multi-path routing algorithm in connection-oriented 
networks where the shortest path is used under light traffic conditions and as the 
shortest path becomes congested, multiple paths are used upon their availability 
in order to balance the load. Recently, there have been a number of multi-path 
TE proposals specifically for MPLS networks that are amenable to distributed 
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online implementation. In [7], the ingress node uses a gradient projection algo- 
rithm for balancing the load among the Label Switched Paths (LSP) by sending 
probe packets into the network and collecting congestion status. Additive In- 
crease/Multiplicative Decrease (AIMD) feedback algorithms are used generally 
for flow and congestion control in computer and communication networks [6]. 
The multi-path AIMD-based approach of [17] uses binary feedback information 
for detecting the congestion state of the LSPs and a traffic splitting heuristic us- 
ing AIMD is proposed in [17] which ensures that source LSRs do not send traffic 
to secondary paths of longer length before making full use of their primary paths. 

Some multi-path routing proposals cause possible de-sequencing (or reorder- 
ing) of packets of a TCP flow. This is due to sending successive packets of a 
TCP flow over different paths with different one-way delays. The majority of the 
traffic in the current Internet is based on TCP and this packet de-sequencing 
adversely affects the application-layer performance of TCP flows [10]. In order 
to avoid packet de-sequencing in multi-path routing, a flow-based splitting 
scheme that operates on a per-flow basis can be used [16]. In [14], flow-based 
multi-path routing of elastic flows are discussed. Flow-based routing in the 
QoS routing context in MPLS networks is described in [11], but the flow 
awareness requirement inside the core network may cause scalability problems 
with increasing number of instantaneous flows. 

Recently, a new scalable flow-based multi-path TE approach for best-effort 
IP/MPLS networks is first proposed in [2] which employs max-min fair band- 
width sharing using an explicit rate control mechanism. This approach imposes 
flow awareness only at the edges of an MPLS backbone. This work demonstrates 
the performance enhancements attained by the flow-based splitting approach 
using comparisons with packet-based (i.e., non-flow based) multi-path routing 
and single-path routing when streaming traffic (i.e., UDP) is used. Significant 
reductions in packet loss rates are obtained relative to single-path routing in all 
the scenarios tested. This architecture is then studied for load balancing of elas- 
tic traffic (i.e., TCP) with AIMD-based rate control (as opposed to explicit rate 
for the sake of practicality) using a simple three node topology [3]. It is shown 
in [3] that flow-based multi-path routing method consistently outperforms the 
case of single-path. In the current paper, we provide an extensive simulation 
study of the approach proposed in [3] for TCP load balancing in larger and 
realistically sized mesh networks. 

It is well-known that using alternative longer paths by some sources force 
other sources whose min-hop paths share links with these alternative paths to 
also use alternative paths [13]. This fact is called the knock-on effect in the lit- 
erature and is studied in depth for alternately routed circuit switched networks 
[9] . Precautions should be taken to mitigate the knock-on effect for example the 
well-known “trunk reservation” concept in circuit switched networks [9]. One 
of the key ingredients of our proposed architecture is the use of strict prior- 
ity queuing that favors packets of primary paths (PP) over those of secondary 
paths (SP) to cope with the knock-on effect. In this paper, we also compare and 
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contrast strict priority queuing with the widely deployed FIFO queuing in their 
capabilities to deal with the knock-on effect in the TCP load-balancing context. 

The remainder of the paper is organized as follows. In Section 2, we present 
our TE architecture. We provide our simulation results in Section 3. The final 
section is devoted to conclusions and future work. 

2 Architecture 

This section is mainly based on [3] but the proposed architecture is outlined here 
for the sake of completeness. In this study, we envision an IP backbone network 
which consists of edge and core nodes (i.e., routers) and which has mechanisms 
for establishing explicitly routed paths. In this network, edge (ingress or egress) 
nodes are gateways that originate/terminate explicitly routed paths and core 
nodes carry only transit traffic. Edge nodes are responsible for per-egress and per- 
class based queuing, flow classification, traffic splitting, and rate control. Core 
nodes support per-class queuing and Explicit Congestion Notification (ECN) 
marking. In this architecture, flow awareness requirement is restricted to edge 
nodes making the overall architecture scale better than some other flow-based 
architectures. 

Our architecture is based on the following building blocks: (i) queuing in 
network nodes, (ii) path establishment, (iii) feedback mechanism and rate con- 
trol, and (iv) traffic splitting. As far as queuing is concerned, the core nodes 
employ per-class queuing with three drop-tail queues, namely the gold, silver, 
and bronze queues and strict priority queuing with the highest (lowest) priority 
given to the gold (bronze) queue, The gold queue is used for Resource Manage- 
ment (RM) and TCP ACK . We envision that ACK packets are identified by the 
ingress node and the encapsulation header for such packets are marked accord- 
ingly. Silver and bronze queues are used for TCP data packets according to the 
selection of paths as explained below. We assume in this study that edge nodes 
are single-homed, i.e., they have a link to a single core node. We setup one PP 
and one SP from an ingress node to every other egress node. We impose that the 
two paths are link-disjoint within the scope of the core network. The PP is first 
established as the min-hop path. If there are multiple min-hop paths, the one 
with the minimum propagation delay is chosen as the PP. In order to find the 
route for the SP, we prune the links used by the PP and compute the min-hop 
path in the remaining network graph. A tie in this step is broken similarly. If the 
connectivity is lost after the first step, we do not establish an SP. We prefer to 
use this simple path selection scheme since we do not assume a-priori knowledge 
of the traffic demand matrix. 

In this paper, we study two queuing models based on the work in [2]. The 
first one is FIFO (first-in-first-out) queuing in which all the TCP data packets 
join the silver queue irrespective of the type of path they ride on. However, 
this queuing policy triggers the knock-on effect due to the lack of preferential 
treatment to packets using fewer resources (i.e., traversing fewer hops). Using 
longer secondary paths by some sources may force other sources whose primary 
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Table 1 . The AIMD algorithm 



if RM packet marked as CE 
ATR := ATR - RDF x ATR 

else 

ATR := ATR + RIF x PTR 
ATR := min (ATR, PTR) 

ATR := max(ATR, MTR) 



paths share links with these secondary paths to also use secondary paths. In order 
to mitigate this cascading effect, longer secondary paths should be resorted to 
only if primary paths can no longer accommodate additional traffic. Based on 
the work described in [2] and [3], we propose strict priority queuing in which 
TCP data packets routed over PPs use the silver service and those routed over 
SPs receive the bronze service. 

Another building block of the proposed architecture is the feedback mecha- 
nism and rate control. In our proposed architecture, ingress nodes periodically 
send RM packets to egress nodes, one over the PP (P-RM) and the other over 
the SP (S-RM). These RM packets are sent in every Trm seconds with the di- 
rection bit set to indicate the direction of flow. If strict priority queuing is used 
and when an P-RM (S-RM) packet arrives at the core node on its forward path, 
the node compares the percentage queue occupancy of its silver (bronze) queue 
on the outgoing interface with a predetermined configuration parameter /i and 
it sets the CE (Congestion Experienced) bit (if not already set) of the P-RM 
(S-RM) packet accordingly. If FIFO queuing is used then it is the silver queue 
occupancy that needs to be checked for both P-RM and S-RM packets. When 
an RM packet arrives at the egress node, it is sent back to the ingress node 
after resetting the direction bit of the RM packet. RM packets travelling over 
the reverse path are not marked by the core nodes. When the RM packet arrives 
back at the ingress node, the CE bit indicates the congestion status of the path 
it was sent over. According to the information, the ingress node updates the 
Allowed Transmission Rate (ATR) of the corresponding rate-controlled path by 
using the AIMD algorithm given in Table 1 [6]. In this algorithm, MTR and 
PTR denote the Minimum Transmission Rate and Peak Transmission Rate and 
RDF and RIF denote the Rate Decrease Factor and Rate Increase Factor, respec- 
tively. Therefore, an ingress node maintains two per-egress queues, one for the 
PP and the other for the SP, that are drained using AIMD-based rate control. 
The proposed architecture is depicted in Fig. 1 for an example 3-node network 
in which solid lines are for PPs whereas the dotted lines stand for SPs originat- 
ing at ingress node 0. We also assume that the switching technology in the core 
network has the necessary fields in the encapsulation header for implementing 
the above-mentioned mechanisms. 

The final ingredient to the proposed approach is the way we split traffic over 
the PP and the SP. The edge nodes first identify new flows. The delay estimates 
for the PP and SP queues (denoted by Dpp and Dsp , respectively) in the edge 
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Egress Node 1 




Fig. 1 . The proposed architecture for an example 3-node network 



nodes are then calculated by dividing the occupancy of the corresponding queue 
with the current drain rate. Upon the arrival of the first packet of the nth flow 
(i.e., a TCP SYN segment) a running estimate of the delay difference (denoted by 
d n ) is calculated as d n = /3(D PP — Dg P ) + (l — /3)d„_i, where /? is the smoothing 
parameter. If d(n) < minth ( d(n ) > maxth ) then we forward the flow over the 
PP (SP). When minth < d n < maxth, then the new flow is forwarded over the 
SP with probability po(d n — minth) / (maxth ~ minth) where minth, maxth an d 
Po are the splitting algorithm parameters to be set. In this paper, we use po = 1. 
Once a path decision is made for the first packet of a flow, all the remaining 
packets of the flow will follow the same path. This traffic splitting mechanism 
is called Random Early Reroute (RER) which is inspired by the RED (Random 
Early Detect) algorithm used for active queue management in the Internet [8]; 
note the similarity in the algorithm parameters. RED is used for controlling the 
average queue occupancy whereas the average smoothed delay difference of silver 
and bronze queues is controlled by RER. RER parameters are generally chosen 
so that the PP is favoured (i.e., minth > 0) and proportional control (as opposed 
to on-off control) is used, i.e., maxth > minth- 

3 Simulation Study 

In this paper, we present the simulation results of our AIMD-based multi-path 
TE algorithm for TCP traffic over a mesh network called the hypothetic US 
topology that has 12 POPs (Point of Presence). This network topology and 
the traffic demand matrix are given in www.fictitious.org/omp and also 
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described in [2]. The proposed TCP TE architecture is implemented over ns-2 
(Network Simulator) version 2.27 [12] and TCP-Reno is used in our simulations. 
We introduced a number of new modules and modifications in ns-2 that are 
available in [1], 

In our simulations, we scaled down the capacities of all links and the de- 
mand matrix by a factor of 45/155 (replace all OC-3 links with DS-3) to reduce 
the simulation run-times. We assume that each of the POPs has one edge node 
connected via a very high speed link to one core node. We use a traffic model 
where flow arrivals occur according to a Poisson process and flow sizes have 
a bounded Pareto distribution denoted by BP(k,p,a) [15]. The following pa- 
rameters are used for the bounded Pareto distribution in this study: k = 4000 
Bytes, p = 50 x 10 6 Bytes, and a = 1.20, corresponding to a mean flow size of 
m = 20, 362 Bytes. The delay averaging parameter is set to /3 = 0.3. TCP data 
packets are assumed to be 1040 Bytes long and RM packets are 50 Bytes long 
(after encapsulation). All the buffers at the edge and core nodes including per- 
egress (primary and secondary) and per-class queues (gold, silver and bronze), 
have a size of 104,000 Bytes each. The TCP receive buffer is of length 20,000 
Bytes. We fix the following parameters for the AIMD algorithm. PTR is chosen 
as the speed of the slowest link on the corresponding path. We use very small but 
nonzero MTR in order to eliminate cases causing division by zero in the simula- 
tions. If the expected delay of a buffer exceeds 0.36 s, then the packets destined 
to the corresponding queue are dropped. We use T RM = 0.02 s and p = 20%. 
The simulation runtime is selected as 300 s. We report only the statistics related 
to those flows that have been initiated in the interval [90 s, 250 s]. 

We compare and contrast three TE policies using simulations. Shortest path 
routing policy uses the minimum-hop path with the AIMD-ECN capability 
turned on and there is no traffic splitting. The second TE policy is the Flow- 
based Multi-path with Shortest Delay (SD) and FIFO queuing. In this policy, SD 
refers to the specific RER setting min t h = max t h = 0 and therefore SD for- 
wards each flow to the path with the minimum estimated queuing delay at the 
ingress edge node and it does not necessarily favour the PP. Moreover, we use SD 
in conjunction with the FIFO queuing discipline where there is no preferential 
treatment between the PP and the SP at the core nodes. The third TE policy 
is the Flow-based Multi-path with RER and Strict Priority queuing approach 
proposed in this paper. 

The goodput of the TCP flow i (in bit/s), denoted by Gi, is defined as the 
service rate received by flow i during its lifetime. Mathematically, G,; = Aj/T h 
where Ai is the number of bits successfully delivered to the application layer 
by the TCP receiver for flow i and Xj is the sojourn time of the flow i within 
the simulation runtime. We note that if flow i terminates before the end of the 
simulation, then Z\,; will be equal to the flow size S t . One performance measure 
we study is the normalized average goodput defined as 
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However, we note that some flows are not fully carried due to overloading of 
certain links in the network. In order to take this effect into account, we introduce 
a new performance measure, called the net average goodput, denoted by G ne t 

n _ Yli AiGi 



by means of equating the service rate of un-carried packets to zero. For the 
same effect, we suggest a new measure, called the Byte Rejection Ratio (BRR), 
to quantify the portion of data that cannot be delivered within the simulation 
duration, in percentage. Mathematically, 



BRR = 



Y, s ,d N ( s i d ) - EM r ( s > d ) 

Hs,d N ( S ’ d ) 



* 100, 



where N(s,d) is the sum of the sizes of flows demanded from node s to node d, 
and r(s, d) is the total traffic (in bytes) successfully delivered to the application 
layer from node s to node d. 

We first study the role of AIMD parameterization on the proposed TE in 
terms of G net and BRR. Figures 2(a) and 2(b) demonstrate the effect of RIF 
and RDF on G net . Similarly, Figures 2(c) and 2(d) present the effect of these 
AIMD parameters on BRR. In these simulations, RER parameters are chosen 
as minth = 1 ms and max t h = 15 ms. We observe that flow-based multi-path 
with RER and strict priority queuing gives better performance in both measures 
than shortest-patlr routing. The choice of RDF= 0.0625 and RIF=0.0625 gives 
relatively good and robust performance in terms of G ne t and therefore we use 
these parameters in the rest of the paper. 

The effect of RER parameters on G net and BRR are presented in Figures 
3(a) and 3(b), respectively. We observe that the performance of the RER is quite 
robust except for the choices of RER parameters close to minth = maxth = 
0, i.e., the SD policy. We observe a sharp decline in the performance of the 
system when we apply the SD policy due to the induced knock-on effect. The 
simulation results show that G ne t for the multi-path routing policy with RER 
and Strict Priority satisfies G net > 5.50 Mbit/s when the RER parameters are 
in the range 0 < minth < 1 ms and 1 ms < maxth < 15 ms. For the same 
example, G net is given by G net « 5.24 Mbit/s and G net « 3.90 Mbit/s for the 
shortest-patlr routing policy with and without AIMD, respectively. This shows 
that for a wide operational range for RER, multi-path routing policy outperforms 
single-path routing policies and the performance of the RER converges to that 
of the shortest-patlr routing policy with AIMD as we increase minth and maxth- 
Based on these observations, we choose the RER parameters as minth — 1 ms 
and maxth = 15 ms from this wide operational range. 

Finally, we scale the incoming traffic by multiplying the flow sizes with a 
scaling parameter 7 where 0.5 < 7 < 1 while fixing the flow arrival times. We 
then vary 7 to see its impact on network performance. In Fig. 4(a), the multi- 
path TE with strict-priority and RER is shown to achieve the highest G ne t for 
all values of 7 . It is also observed from Fig. 4(a) that the proposed TE approach 




Combined Use of Prioritized AIMD and Flow-Based Traffic Splitting 



131 




Fig. 2. As a function of RIF and RDF : (a) Gnet for the multi-path TE with strict- 
priority and RER, (b) Gnet for the shortest-path routing, (c) BRR for the multi-path 
TE with strict-priority and RER (d) BRR for the shortest-path routing 




Fig. 3. As a function of minth. and maxt.h ■ (a) G ne t for the multi-path TE with 
strict-priority and RER (b) BRR for the multi-path TE with strict-priority and RER 
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Fig. 4. As a function of traffic scaling parameter 7 : (a) G ne t and G (b) Byte Rejection 
Ratio 



outperforms the other policies in terms of G as well. This shows that the multi- 
path TE with strict-priority and RER. not only carries more traffic but also the 
carried flows are transported faster. 

In Fig. 4(b), we observe that the policy of multi-path routing with strict- 
priority and RER has a BRR which is approximately half of that of the shortest- 
patlr routing policy for 7 = 1 . As the offered traffic decreases, the gap between 
the multi-path routing with strict-priority and RER and the shortest-path rout- 
ing disappears. This is due to the fact that the PP is not congested at light 
traffic loads and the multi-path routing nearly boils down to shortest-path rout- 
ing. We also observe that the SD routing with FIFO queuing gives lower BRR 
than the proposed TE policy for some values of 7 . However, the net goodput 
of the multi-path routing with SD and FIFO queuing is 25-50% lower than the 
proposed TE approach when 7 varies between 0.5 and 1.0, as shown in Fig. 4(a). 



4 Conclusions 

In this paper, we report our findings on a recently proposed TCP load balancing 
architecture that uses prioritized AIMD and flow-based multi-path routing with 
RER. Using a publicly used test network, we show that our proposed architecture 
consistently outperforms the case of a single path in terms of average normalized 
goodput and the byte rejection ratio. We show in this paper that the architecture 
stays robust for relatively large networks, extending our existing results for small 
topologies. On the other hand, we also show that employing load balancing with 
conventional FIFO queuing and shortest delay policies does not always produce 
better results than that of a single path, which can be explained by the knock-on 
effect. Future work in this area will consist of incorporating a-priori knowledge 
on the traffic demand matrix into the proposed architecture. 
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Abstract. The dualism of IGP and MPLS routing has raised a debate separating 
the IP community into divergent groups expressing different opinions concerning 
how the future Internet will be engineered. This paper presents an on-line traffic 
engineering model which uses a hybrid IGP+MPLS routing approach to achieve 
efficient routing of flows in IP networks. The approach referred to as Hybrid 
assumes a network-design process where an optimal network configuration is built 
around optimality, reliability and simplicity. The IGP+MPLS routing approach is 
applied to compute paths for the traffic offered to a 50-node network. Simulation 
reveals performance improvements compared to both IGP and MPLS routing in 
terms of several performance parameters including routing optimality, network 
reliability and network simplicity. 



1 Introduction 

The Internet has developed beyond the best effort service delivered by traditional IP 
protocols into a universal communication platform where traffic management methods 
such as QoS and Traffic Engineering (TE), once the preserve of telephone networks, will 
be re-invented and used to deliver differing resource requirements. 

Traffic engineering allows traffic to be efficiently routed through the network by 
implementing QoS agreements between the available resources and the current and 
expected traffic. The dualism of IGP and MPLS routing has raised a debate separating 
the IP community into divergent groups with different views concerning how the future 
Internet will be engineered [1]. On one hand, there are proponents of the destination- 
based TE model using traditional IGP routing. They point to ( 1 ) the ability of the Internet 
to support substantial increases of traffic load without the need for TE mechanisms such 
as proposed by MPLS and (2) the capability of traditional IGP routing to optimize routing 
by using appropriate adjustments to the link weights. On the other hand, the advocates 
of the newly proposed MPLS standard argue for a flow-based TE model using source- 
or connection-oriented routing. In MPLS packet forwarding and routing are decoupled 
to support efficient traffic engineering and VPN services. MPLS networks route traffic 
flows over bandwidth-guaranteed tunnels referred to as Label Switched Paths (LSPs) 
which are set up and torn down through signaling. 

Hybrid routing approaches which either combine the best of destination-based 
and flow-based TE or allow a smooth migration from traditional IGP routing to the 
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newly developed MPLS standard have scarcely been addressed by the IP community. A 
trace-based analysis of the complexity of a hybrid IGP+MPLS network was presented 
in [2]. This analysis uses a trigger-based mechanism to (1) differentiate flows based on 
their measured bandwidth during the last minute and (2) route these flows differently 
according to their bandwidth characteristics. Other papers [3, 4, 5, 6] have addressed the 
problem of routing IP flows using offline IGP+MPLS approaches where the network 
topology and traffic matrix are known a priori. 

This paper presents a TE model which uses a hybrid IGP+MPLS routing approach 
to achieve efficient routing of flows in IP networks. These flows can be high bandwidth 
demanding flows (HBD) such as real-time streaming protocol flows or low bandwidth 
demanding flows (LBD) such as best-effort FTP flows. We assume an online network 
design process where an optimal network configuration is built around optimality, 
reliability and simplicity. We adopt a path selection model where traffic flows are 
classified into LBD and HBD traffic classes at the ingress of the network and handled 
differently in the core where LBD flows are routed using traditional IGP routing while 
HBD flows are carried over MPLS tunnels. The main contributions of this paper are 

- Reliability-related optimality. Reliability and optimality are the two key drivers for 
traffic engineering. These two TE objectives have been addressed separately by the IP 
community though neither objective considered alone is a true measure of the efficiency 
of a network. We present a routing approach referred to as Hybrid which combines 
reliability and optimality into a mixed routing metric minimizing both the number of 
flows to be re-routed under link failure (reliability) and the magnitude of flows rejected 
under heavy load conditions (optimality). 

- Routing simplicity. Hybrid uses a simple design process where no changes to 
the traditional routing algorithms are required besides designing the link cost metric 
to integrate reliability and optimality. By moving complexity into the design of the 
link cost. Hybrid leads to simplicity in terms of time/logical complexity and easy 
implementation by ISPs. Hybrid implements a path selection model where a small 
number of LSPs are set up into an IGP network to overcome the network complexity 
resulting from the signaling operations required to setup and tear down LSPs in a pure 
MPLS network implementation. 

- Routing performance improvements. Multiple metric routing using different TE 
objectives has been suggested to be used at best as an indicator in path selection [7] 
since (1) it may result in unknown composition rules for the different TE objectives and 
(2) it does not guarantee that each of the TE objectives is respected. An application to a 
50-node network reveals that, using multiple metric routing, Hybrid achieves an optimal 
network configuration outperforming IGP routing in terms of network optimality and 
reliability and MPLS routing in terms of the network simplicity. 

The reminder of this paper is organized as follows. Section 2 presents the IGP+MPLS 
routing approach. An application of the IGP+MPLS routing approach to compute paths 
for the flows offered to a 50-node network is presented in section 3. Our conclusions are 
presented in section 4. 
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2 The IGP+MPLS Routing Approach 

Consider a network represented by a directed graph (Af, C) where AT is a set of nodes, 
£ is a set of links, N = |A/”| is the number of nodes and L = \C\ represent the number 
of links of the network. Assume that the network carries IP flows that belong to a set 
of service classes S = { LBD , HBD} where LBD and HBD define the class of low 
bandwidth demanding (LBD) and high bandwidth demanding (HBD) flows respectively. 
Let C( denote the capacity of link £ and let 7 \ e denote the set of paths connecting the 
ingress-egress pair (i, e). Assume a flow-differentiated services where a request to route 
a service class s £ S flow of di >e bandwidth units between an ingress-egress pair (i, e) 
is received and that future demands concerning IP flow routing requests are not known. 

For each path p let L p = Yheep Le{ne, r e, a(s),f3(s)) denote the path cost where 
Le(ne, re, a(s),{3(s)) is the cost of link £ when carrying ne flows, re is the total band- 
width reserved by the IP flows traversing link £ and (a(s), (3(s)) is a pair of network 
calibration parameters depending on the flow service class s. The flow routing problem 
consists of finding the best feasible path p s £ 'P, f where 



= min L v 

PC'Pi.e 


(1) 


< min (Ce - re). 

teps 


(2) 



Equations (1) and (2) express respectively the optimality of the routing process and the 
feasibility of the flows. 

We consider a flow routing algorithm using a route optimization model based on a 
mixed cost model and a path selection model based on flow differentiation. 

2.1 The Route Optimization Model 

We consider a new cost metric which combines optimality and reliability with the expec- 
tation of routing the IP flows so that fewer flows are rejected under heavy load conditions 
and fewer flows are re-routed under link failure. 

- Reliability: the link loss. The main reliability objective of our routing approach is 
to minimize the damage to the network transport layer under failure. This damage is 
expressed by the number of re-routed flows under failure. Let T be a set of possible 
failure patterns, Wf the probability of the failure pattern / £ T and n/ the number of 
re-routed flows under failure /. The expected number of re-routed flows under the set 
of failure patterns T is defined by 

w = E w f = E w f n f w 

where Wf = WfUf expresses the damage to the network transport layer under failure 
event /. 

Assuming that a fiber cut is the most likely failure event in optical networks, we 
consider the set of failure events T = C and define a measure of reliability expressing 
the link loss by 

We = we ^ Se, r 

r£lZ 



( 4 ) 
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where wt is the probability for the IP flows to traverse link l C referred to as the link 
loss probability, 1Z = Ui ie 1Zi >e is the set of flows carried by the network, 7 Zi. e is the 
subset of flows from node i to node e, nt = 5Zre77 t ^ le t0 ’ ;a ^ num t> er of flows 

carried by link £ referred to as its interference and 

x _ ( 1 flow r traverses link £ 

° e ’ r = \ 0 otherwise (5) 

- Optimality: the congestion distance. Most routing algorithms which maximize band- 
width (optimality) assume a fair bandwidth sharing process where the different flows 
receive the same service on a link £. This service is expressed by the link residual band- 
width C, - rt: . We consider a measure of optimality referred to as the link congestion 
distance defined by 

De(re, 0(s)) =C e - f3(s)(r e + d i>e ) (6) 

where (3 (s) is a calibration parameter expressing the subscription for bandwidth to 
the network link. Note that, in contrast to the residual bandwidth Cp — rt which is 
independent of the demand, the link congestion distance includes in its definition the 
bandwidth demand d ie which can be low for LBD flows and high HBD flows. It is 
expected that by introducing unfairness among flows, our measure of optimality will lead 
to a link sharing model which improves the overall network performance by allowing 
the different flows to meet their service needs. 

- A mixed cost metric. The routing optimization adopted in this paper is based on the 
assumption that a link metric minimizing the link loss (or equivalently the number of 
flows on a link) and maximizing the link congestion distance (or equivalently its inverse) 
can balance the number and magnitude of flows over the network to reduce rejection 
under heavy load conditions and re-routing under link failure. Hybrid achieves this 
objective by multiplying the link loss probability by power values of the link interference 
and the link congestion distance to form the mixed metric expressed by 

Lt{nt,rt,a(s),f3{s)) = wn^ / (C e - (3{s)(r e + rfi, e )) 1_ “ (s) (7) 

where 0 < a(s) < 1 is a calibration parameter expressing the trade-off between relia- 
bility and optimality. 

2.2 The Path Selection Model 

The basic idea behind our path selection model is to differentiate flows into classes 
based on their bandwidth requirements and route these flows using different cost metrics 
according to their service needs. 

- Flow differentiation. In Hybrid flows are classified into LBD and HBD traffic classes 
depending on their bandwidth requirements (d l e ). The two traffic classes are defined by 

LBD = {class of flows requesting e bandwidth units | d l>e < r} (8) 

HBD = {class of flows requesting di, e bandwidth units | d l>e > r} (9) 

where each flow bandwidth demand d, e is uniformly distributed in the range [0, M] and 
0 < r < M is a cut-off parameter defining the limit between LBD and HBD flows. 
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- Routing metrics. In Hybrid the IP flows are routed using different routing metrics 
expressing their service needs: the paths followed by LBD flows are found using the 
IGP-based OSPF model while HBD flows are routed over MPLS Label Switched Paths 
(LSPs). This is achieved using the link cost (7) which can lead to different routing 
metrics depending on the values of the link loss probability we and the set of parameters 
(a(s), /3(s)). The routing metrics leading to IGP-based OSPF and MPLS routing are 
obtained by setting (1) the link loss probability to a constant value expressing an equal 
probability assumption where each link £ has the same probability we = 1/L to carry 
any traffic flow and (2) the set of parameters (a(s) , /3(s)) to (a, 1) for MPLS routing and 
(0, 0) for IGP-based OSPF routing. Note that the link loss probability we can be assigned 
appropriate weights to move the traffic away from bottleneck links such as proposed in 
the OSPF weight optimization TE model [8]. However the optimal weight setting may 
require an offline computation process which is beyond the scope of this paper. 

- Routing algorithm. Consider a request to route a class s flow of d l f bandwidth units 
between two nodes i and e. The algorithm proposed (hereafter referred to as HYBR ) 
executes the following steps to route this flow 

1 . Network calibration. 

set (a(s),/3(s)) = (0,0) if s € LBD, or 
set (a(s),/3(s)) = (a, 1) if s £ HBD. 

2. Path selection. 

a) Traffic aggregation, if s £ HBD 

- find an existing LSP which can accommodate the new HBD request, 

- if found then (a) set p s := p where p is the path carrying the existing LSP 
and (b) goto step 3. 

b) Prune the network. Set Le(ni, re, a(s), /3(s)) = oo for each link t whose link 
slack Ce — re < di te . 

c) Find a new least cost path. Apply Dijkstra’s algorithm using the link cost (7) 
to find a new least cost path p s £ V,,,.. This path is used to setup a new MPLS 
LSP for HBD flows when s £ HBD. 

3. Route the request. 

- Assign the traffic demand di, e to path p s . 

- Update the link occupancy and interference. For each link £ £ p s 

set re := re + di^ e and ne := ne + 1. 

Note that the path selection algorithm has the same complexity as Dijkstra’s algorithm: 

O (N 2 ). 

3 An Implementation 

This section presents simulation experiments conducted using a 50-node test network 
to compare the performance of (1) IGP routing using the Open Shortest Path First 
(OSPF) model [9] (2) MPLS routing using the Least Interference Optimization Algo- 
rithm (LIOA) [10] and (3) hybrid IGP+MPLS routing using the newly proposed HYBR 
algorithm. The 50-node network used in our experiments includes 2450 ingress-egress 
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(a) The 50-node network 



(b) The parameter values 



Fig. 1 . The 50-node network and simulation parameters 



pairs and 202 links capacitated with 3 8,5 19,241 units of bandwidth. Each node is a poten- 
tially edge node. Figure 1 presents a graphical representation of the 50-node network and 
the simulation parameter values. The parameter values for the simulation experiments 
are presented in terms of the offered flow routing requests, the value of the calibration 
parameter a, the flow request rates A, the flow holding time 1 / p, the cut-off parameter r, 
the maximum bandwidth demand M, the number of simulation trials T and the number 
of flow requests per trial R. The flow arrival and services processes are Poisson. We con- 
sidered short-lived flows only since initial experiments revealed the same performance 
patterns for short-lived (1 / pt = 1) and long-lived flows (1/p = oo). 

3.1 Performance Parameters 

The relevant performance parameters used in the simulation experiments are (1) the 
quality of the paths expressed by the path length, the path multiplicity and the preferred 
path usage (2) the network optimality expressed by the percentage flow acceptance 
ACC and the average link utilization UTIL (3) the network reliability expressed by 
the average link interference AV and the maximum link interference MAX and (4) the 
network simplicity expressed by the network gain GAIN. The path length determines the 
average path length in number of hops. It gives an indication on resource consumption: 
a longer path ties up more resources than a shorter path. The path multiplicity expresses 
the average number of paths used by a source-destination pair. It gives an indication 
of the load balancing capability of an algorithm: a higher value of this parameter is an 
indication of a more balanced network. The preferred path usage expresses how often 
a path finding algorithm routes flows on the preferred path connecting an I-E pair (the 
path most used by an I-E pair is defined as the preferred path). ACC is the percentage of 
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Fig. 2. Experiment 1. The quality of the paths 



flows which have been successfully routed by the network. UTIL defines the average link 
load. It expresses how far a link is from its congestion region. AV is the average number 
of flows carried by the network links and MAX is the number of flows carried by the 
most interfering link. AV and MAX express the number of flows which must be re-routed 
upon failure: an algorithm which achieves lower interference is more reliable since it 
leads to re-routing fewer flows upon failure. GAIN determines the reduction in number 
of signaling operations (LSP setup and tear down) resulting from the implementation of 
Hybrid instead of a pure MPLS implementation. 



3.2 Simulation Experiments 

Four simulation experiments were conducted to analyze (1) the quality of paths used 
by the LBD flows, the dominant flows in the network (2) the impact of the calibration 
parameter a on the network efficiency (3) the impact of the cut-off parameter r on the 
network efficiency and (4) the impact of the demand range [0, M] on the network effi- 
ciency. The results of the experiments are presented in Figures 2, 3, 4 and Table 1. Each 
entry in Table 1 presents the average of each of the performance parameter described 
above {ACC, UTIL, AV, MAX, and GAIN). These averages (considered for different val- 
ues of the a parameter) are computed at 95% confidence interval within 0.1% of the 
point estimates. Our choice of the best performance values is based on a trade-trade-off 
between the different values achieved by the different performance parameters. The best 
performance is indicated in bold. 

Experiment 1. The quality of paths for LBD flows. The quality of the paths carrying 
LBD flows illustrated by Figure 2 shows that approximatively 70% of the routes used by 
the three routing algorithms are at most 3 hops long (Figure 2 (a)). The three algorithms 
thus perform equally well in terms of resource consumption. HYBR achieves the best 
route multiplicity and route usage, OSPF performs worse in terms of path multiplic- 
ity (Figure 2 (b)) and path usage (Figure 2 (c)). These results show that HYBR achieves 
the best stability in terms of path selection and balances the flows over the network better 
than the LIOA and OSPF models. 

Experiment 2. The impact of the calibration parameter a. The results presented in 
Table 1 show that IP routing using OSPF achieves the highest values of the link interfer- 
ence {AV and MAX ) while still exhibiting the lowest average link utilization UTIL. This 
is in agreement with the routing in the Internet where OSPF routing can lead to low link 
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utilizations (typically 30%) on some links while other links are carrying the majority of 
the traffic flows. MPLS routing using LlOA achieves (1) the lowest values of the MAX 
and AV parameters and (2) higher acceptance ACC compared to OSPF routing. For the 
cut-off parameter setting (r = 150), HYBR reduces the network complexity by setting 
up 30% fewer LSPs than a pure MPLS routing using the LIOA model. The rest of this 
section will show that this percentage can be increased by choosing a suitable value of 
the cut-off parameter r but at the price of reducing the flow acceptance and increasing the 
link interference. For the type of traffic considered, mapping new HBD flows to existing 
LSPs did not increase significantly the network gain compared to the case where HBD 
flows were not aggregated. This results from the fact that under the short-time holding 
time assumption the LSPs created on an I-E pair are torn down before the arrival of new 
HBD requests on the same i-E pair. 

Experiment 3. The impact of the cut-off parameter t. The results presented in Figure 3 
show that in IGP+MPLS routing, the network gain (simplicity) is increased at the price of 
the degradation of the network optimality (Figure 3 (a)) and the network reliability (Fig- 
ure 3 (b)). Finding a network operational point which balances reliability, optimality 
and simplicity is therefore an important aspect of the IGP+MPLS routing model. Con- 
sider the two functions A(x) and M[x) representing respectively the flow acceptance 
ACC(t) and scaled values of the maximum interference MAX (t) where the values M (x) 
are defined by M(x) = 100 * MAX(x) /max te { T } MAX(k). Figure 3 (c) depicts two 
curves obtained by approximating the data (values of the functions M (x) and A(x)) 
with a Bezier curve of degree n = 11 (the number of data points) that connects the 
endpoints. This figure shows that for the network model under consideration, a network 
operational point which balances optimality, reliability and simplicity may be found at 
the intersection of the two curves M[x ) and A(x) located around r = 300; a cut-off 
value corresponding to approximatively 60% network gain, 89% flow acceptance and 
the reliability values AV = 437 and MAX = 1643 where AV values are in the average 
interference range [426,451] and MAX values are in the maximum interference range 
[1267,1739]. 

Experiment 4. The impact of the demand range [0, M], The curves depicted by Fig- 
ure 4 (a), (b), (c) represent the values of the flow acceptance, the average link utilization 
and the link interference respectively for different demand ranges. These curves reveal 
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that for the three algorithms the flow acceptance ACC decreases with increasing range 
while the link utilization increases. Figure 4 (a) shows that (1) LlOA and HYBR achieve 
the same and higher flow acceptance compared to OSPF (2) the flow acceptance is the 
same for all the three algorithms for small values of M (M < 300) and (3) for higher 
values of M (M > 300), HYBR and LIOA perform better than OSPF. HYBR , LIOA 
and OSPF routing lead the network into a congestion region for a value of M = Thr 
where on average the network rejects more than 20% of its offered flows. We refer to this 
value as the network congestion threshold. Simulation reveals that this value is lower 
( Thr = 525) for OSPF than for HYBR and LIOA ( Thr = 600). This finding reveals 
that under varying traffic conditions a network implementing OSPF routing enters into 
its congestion region quicker than a network implementing HYBR or LIOA routing. Fig- 
ures 4 (b) and (c) reveal the same performance pattern as Table 1 where OSPF achieves 
the lowest link utilization but higher interference compared to the HYBR and LIOA 
models. 



4 Conclusion 

This paper presents Hybrid , an IGP+MPLS routing approach which combines route op- 
timization and efficient path selection to achieve efficient routing of flows in IP networks. 
The route optimization model adopted by Hybrid combines reliability and optimality 
to re-route fewer flows under link failure and reject fewer flows under heavy load con- 







Online Traffic Engineering: A Hybrid IGP+MPLS Routing Approach 143 



ditions. Preliminary simulation results using a 50-node network reveal that the hybrid 
IGP+MPLS routing approach performs as good as an MPLS solution in terms of op- 
timality. IGP+MPLS routing outperforms IGP routing in terms of network reliability 
and network optimality. Finally the IGP+MPLS routing approach performs better than 
MPLS routing in terms of network simplicity and finds better paths than both MPLS 
routing and IGP routing. 

In Hybrid IGP routing is achieved by setting the OSPF link metric inversely pro- 
portional to the link capacity. It is expected that hybrid routing approaches combining 
MPLS routing and OSPF weight optimization should lead to further performance im- 
provements. The focus of our ongoing work in the hybrid IGP+MPLS framework lies 
on these improvements. The performance of the IGP+MPLS approach when routing a 
mix of short- and long-lived flows is also under investigation. 
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Abstract. Genetic algorithms are a useful tool for link weight optimiza- 
tion in intra-domain traffic engineering where the maximum link load is 
to be minimized. As a local heuristic, the weight of the maximum loaded 
link is increased to speed up the search for a near-optimal solution. We 
show that implementing this heuristic as directed mutation outperforms 
an implementation as an inner loop in both quality of the result and num- 
ber of calls to the objective function when used together with caching. 
Optimal mutation rates result in surprisingly high cache hit ratios. 

1 Introduction 

One goal in intra-domain traffic engineering is to distribute the load of the 
network so that no link is congested. For routing protocols based on link weights 
and shortest paths like OSPF [1] and IS-IS [2], this goal can be achieved through 
selecting the link weights which determine the shortest paths in such a way that 
the maximum over all link loads is minimal. The problem of computing optimal 
link weights for a given network topology with link capacities and traffic matrix 
is NP hard [3] [4] . 

Besides other optimization schemes like tabu search [4] [5] [6] [7] [8], genetic 
algorithms have been used to solve this optimization problem [9] [10] [11] [12] [13]. 
Genetic algorithms resemble the evolutionary adaption of a biological population 
to its environment (given by an objective function) through selection (“survival 
of the fittest”), crossover (exchange of genetic substrings between chromosomes) 
and mutation [14]. Beyond the plain genetic algorithm, a local heuristic can be 
applied for link weight optimization, which increments the weight of the most 
loaded fink in the network. 

The local heuristic is typically implemented as a loop which increments the 
weight of the most loaded link until the overall maximum of all loads increases. 
In this paper, this strategy is analyzed and a different approach called directed 
mutation is proposed and evaluated. We also use caching to avoid calls to the 
objective function for chromosomes which have been evaluated already during 
earlier generations. The results show that the new strategy outperforms the 
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(ministry for education and research) of the Federal Republic of Germany under 
contract 01AK045. The authors alone are responsible for the content of the paper. 
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existing one in terms of both quality of the result and number of calls to the 
(expensive) objective function. The latter is the case due to high cache hit ratios. 

The fact that an optimal mutation rate yields relatively high cache hit ratios 
is surprising because a high cache hit ratio clearly indicates that many points 
in the search space are visited repeatedly. Increasing the mutation rate should 
therefore lead to better exploration of the search space and improve the result. 
We found this assumption to be wrong and present a strange effect: the best 
strategy investigated re-creates on average each potential solution twice. 

The paper is organized as follows: In Sec. 2 we resume the objective function, 
genetic algorithms and the current implementation strategy of the local heuris- 
tic. In Sec. 3, we analyze this approach, propose a new strategy called directed 
mutation, and caching. The new scheme is evaluated in Sec. 4, and discussed in 
Sec. 5. Sec. 6 concludes the paper with further possible improvements. Related 
work is discussed throughout the paper. 



2 Background 

An instance of the problem to be solved is given as a tuple ( G , k, p, (fl) where 

— G = ( V , E ) is the network graph where V is set of nodes and E C V x V is 
the set of directed links without self loops, 

— k(u, v) > 0 is the capacity for each link (u, v) € E, 

— p(u,v ) > 0 is a metric i.e., the link weights for all (u,v) € E, and 

— t) > 0 is the bit rate of a flow from source s £ V to target t £ V. 

With ECMP [1], the shortest paths are fully determined by graph G and link 
weights p. The traffic matrix </> and the link capacities n furthermore fully de- 
termine the relative load p(u,v) > 0, which is the ratio of (absolute) load and 
capacity for each link. If G, 4> and k are fixed, the link load p and, in particular, 
the maximum link load p* depend only on the link weights p. This defines the 
objective function fl we are interested in as 

P* = ^(m). 

which returns the maximum load for a given set of link weights. The goal of the 
optimization problem is to find link weights po so that p* = f2(po) is minimal. 

Other objective functions or additional constraints are also possible. For ex- 
ample, link failures can be taken into account to minimize the maximum load for 
the worst case failure [8] [15] [16], or to find new link weights so that only a min- 
imal subset of weights has to be configured in routers after a failure [5] . Instead 
of minimizing the maximum link load, a function thereof can be minimized, like 
the piece-wise linear function in [4] [5] [6] [15] [16]. 

Throughout this paper, we consider only the base case described above and 
mention only that this work is not restricted to this. 
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2.1 Genetic Algorithms 

Genetic Algorithms (GAs) belong to the class of evolution strategies used in 
optimization. They resemble the process of biological evolution, where each in- 
dividual of a population with a given size is described by its genetic code, called 
a chromosome. The chromosomes here are the sets of link weights p, ordered as 
a sequence. The initial population is made up of random sequences. Each chro- 
mosome has a chance to be selected for reproduction, where the probability to 
be selected increases with the fitness of the chromosome. In our case, the fitness 
is the reciprocal value of the maximum link load 1/p*, thus, lower values for p* 
increase the probability for selection. 

During reproduction, a crossover operator is applied to a pair of selected 
chromosomes. With N-point crossover, N randomly selected positions in both 
parents yield N+l subsequences, which are exchanged between the parents, so 
that two merged offsprings are formed. Whether crossover is applied is deter- 
mined by a usually high probability (0.9 - 1.0). The idea behind crossover is that 
good properties of both parents are combined in a child, creating an even better 
chromosome. More complex crossover operators are also possible, for example 
the one described in [10]. 

After crossover, the resulting chromosomes are mutated by replacing ran- 
domly selected weights with a new random value. Mutation asserts variability 
in the search space by introducing new genetic information. The mutation prob- 
ability has normally a low value (about 0.05 per link weight). 

Finally, the best chromosome of the current population is always copied to 
the new generation, a strategy called elitism [14]. The entire process is repeated 
until a termination condition is met, usually just a given number of generations. 
The result is the best chromosome ever evaluated. 



2.2 The Local Heuristic as an “Inner Loop” 

GAs as described above are applicable to any optimization problem as long as 
a representation as chromosomes with a crossover operator and an objective 
function are defined. For example, Riedl et al. [17] uses a GA to optimize entire 
network topologies. 

For the specific problem of link weight optimization, an additional local 
heuristic can be applied. After evaluation of a chromosome, let ( u , v) be the 
maximum loaded link with p(u,v) = p*. It is then reasonable to increase the 
weight p(u,v), since this “lengthens” all shortest paths containing (it, v) and 
tends to shift off load, so one can expect that the maximum loaded link (it, v) 
carries less load than before. 

In [10] [11], this local heuristic is implemented in form of an inner loop, ba- 
sically as follows: When a chromosome is evaluated, the objective function is 
called to determine the maximum load and the associated link weight. Then a 
loop is entered which increments this weight by one, and the objective function 
is called again, resulting in a new maximum load and link. This is repeated until 
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Fig. 1. Frequencies of weight increments (a) and number of incremented weights per 
invocation of the inner loop (b). 



the new maximum load becomes greater than the previous one. After each iter- 
ation, it is possible that the maximum loaded link changes, so that the weights 
of several links might have been incremented when the loop is left. A single link 
weight could have been incremented also more than once. In other words, the 
inner loop finds a nearby local minimum. 

The objective function needs to compute the shortest path sink trees for all 
nodes and to propagate the traffic demands along the computed paths. Since 
the inner loop changes only one link weight per iteration, incremental algo- 
rithms which reuse state from previous computations can be used to increase 
efficiency [10]. 

A G A with a local heuristic is also called “memetic algorithm” [9] . 

3 Analysis and New Approach 

The GA runs about 340 generations for a population of 21 chromosomes, mak- 
ing 31521 calls of the objective function, for a network with 20 nodes and 106 
uni-directional links (cf. Sec. 4 on further parameters). This means that on av- 
erage 31521/340 « 92.7 calls to the objective function per generation are made, 
resulting in an average of 92.7/21 « 4.4 calls per chromosome. 

Fig. 1(a) shows the frequencies of weight increments, and Fig. 1(b) shows 
the frequencies of the number of incremented weights per invocation of the inner 
loop. The inner loop increments a single weight up to 51 times, and up to 19 
weights per invocation. But most often, a weight is incremented only once or not 
at all, and most often, no weight is successfully incremented. Repeating the run 
with different seeds of the random number generator yields similar results. 

These results do not imply the inner loop is useless; the GA converges faster 
with the local heuristic than without it. But the inner loop is still expensive. 
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For the case that one weight is incremented just by one, the objective function 
is called three times: The first time to get the initial maximum load and most 
loaded link, the second time after the first increment, and the third time just 
to find that the second increment worsened the result and therefore needs to be 
undone. 

Furthermore, the cases where many weights are incremented by large amounts 
occur during the early generations of a run, where the population still contains 
much random information from the initial, randomly created chromosomes. Such 
chromosomes are still far from global minima. For example, it is not unlikely for 
randomly generated weights to violate the triangle in-equation 1 . In other words, 
the inner loop is most efficient in finding local minima which are unlikely to be 
also global minima. 



3.1 Directed Mutation 

Instead of implementing the local heuristic as an inner loop which always “falls” 
into a nearby local minimum, we implemented it as directed mutation , as follows: 
A chromosome is evaluated normally by calling the objective function only once, 
but the most loaded link for this chromosome is remembered. Later, if the chro- 
mosome is selected for the next generation, not only random mutation is applied, 
but also the weight of the remembered link is incremented by one. This can be 
seen as a non-random mutation in the sense that it is a modifying operation on 
a single chromosome (while crossover operates on at least two). This approach 
makes the local heuristic only a trend towards a nearby local minimum, instead 
of a strict rule which always explores the minimum, and no additional calls of 
the objective function are required. 

The drawback is that directed mutation interferes with crossover, which is 
normally performed before mutation. Crossover changes the chromosome, so 
that the remembered information about the most loaded link is invalidated. 
Therefore, directed mutation can be applied only to chromosomes which are 
not changed by crossover. To deal with this problem, we simply decrease the 
probability for crossover to leave more unchanged chromosomes 2 . 

Directed mutation is similar, but less complex than the approach described in 
[13], where all link weights are increased if their loads exceed certain thresholds. 
Additionally, all link weights are decreased if there loads are less than other 
certain thresholds. We found that decreasing weights of little loaded links does 
not improve the convergence of the GA. 



1 The triangle in-equation is violated if a link weight is greater than the weight of a 
detour connecting the endpoints of the link. With shortest path routing, such links 
remain unused for every destination. 

2 The other approach, doing directed mutation before crossover, leads to worse results. 
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3.2 Caching 

In addition, we added an evaluation cache to increase efficiency. The cache stores 
each chromosome which has not yet been evaluated together with maximum load 
and associated link. There are three cases where a cache hit can occur: 

— The best chromosome is always copied unchanged to the next generation 
(elitism), where it will be evaluated again. 

— Sometimes the same chromosome is selected as both parents for crossover. 
Crossover will have no effect in such cases and simply return two identical 
clones of the identical parent chromosomes. Since the mutation rate is low, 
both children may also pass the mutation step and eventually reach the new 
population without having been modified. 

— A chromosome is simply re-created by chance. Such cases are explicitely 
dealt with in e.g., tabu search, but not in GAs. 

The original motivation for the cache was just to save some CPU time in the 
cases described above, because the objective function is by far the most time 
consuming function of the optimizer, especially for large networks 3 . We did not 
expect a high cache hit ratio, since this would imply that many points in the 
search space are visited repeatedly, indicating badly chosen parameters like e.g., 
the probability for random mutation. The surprising result is that this assump- 
tion was incorrect, as we will show in the following section. 

4 Evaluation 

We implemented both strategies for the local heuristic, inner loop and directed 
mutation, in C++. Rank selection [14] is used as selection scheme: the n chro- 
mosomes are sorted according to their fitness and the selection probability pi 
depends on rank i, as follows: 



Pi = 



2 n — i 



E£d2 n-k' 



which makes the probability for the worst chromosome half the probability of 
the best chromosome. N-point crossover is used, where the number of split points 
N is half the number of directed links. 

All runs are performed with a population of 21 chromosomes. The initial 
population contains 20 random chromosomes with weights uniformly distributed 
in the range [1 — 30], and one chromosome with all weights equal to 15. Although 
the upper bound (30) is considerably less than the maximum weight allowed in 
e.g., OSPF (2 16 — 1), it significantly shrinks the search space without restricting 
it too much to exclude good solutions. The inner loop and directed mutation 
operations, however, may exceed this limit if required. 



3 The complexity for the (non-incremental) objective function is 0(V 2 logU + VE) 
with a Fibonacci heap in Dijkstra’s shortest path algorithm. 
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Table 1 . Investigated Networks 



Network 


Nodes 


Link Weights 


Generations 


Evaluations 


Capacities 


Cost 


11 


52 


700 


14700 


inhomogeneous 


Labnet 


20 


106 


1500 


31521 


homogeneous 


GT40 


40 


344 


2000 


42021 


homogeneous 



The crossover probability differs for both strategies. For the inner loop, it is 
set to 1.0, which means that crossover is applied always. For directed mutation, 
the crossover probability is set to the considerably smaller value 0.3, because we 
want increase the possibility for directed mutation, as described in Sec. 3.1. Both 
values have been determined by repeated experiments with different networks, 
so each strategy operates on its favorable working point. Both strategies use the 
evaluation cache. 

The mutation probability for random mutation in both strategies is one of 
the parameters we want to study and will be in the range [0.01 — 0.1]. Directed 
mutation in the second strategy will be applied always when possible, i.e., when 
crossover is not performed, so the probability for directed mutation can be given 
as 1 - 0.3 = 0.7. 

We want to compare the performance of both strategies in terms of resulting 
maximum load and number of evaluations, i.e., calls to the objective function 
including cache hits. To give both strategies the same budget of evaluations, 
a run is terminated after a given number of evaluations has been performed 
(instead of generations). The last population is always allowed to complete, 
which constitutes a negligible bias in favor of the inner loop strategy. 

Information about the investigated networks is given in Tab. 1. The gener- 
ation numbers (quotient of evaluations and population size) correspond to the 
number of evaluations made for the directed mutation strategy. Networks with 
equal capacities for all links are called homogeneous. 

The results are depicted in Fig. 2, Fig. 3 and Fig. 4. For each network, the 
left diagram (a) shows the maximum load as a function of the mutation rate, the 
middle diagram (b) shows the cache hit ratio in percent, and the right diagram 
(c) shows the running times in CPU seconds on a Pentium III Linux PC, 800 
MHz, 256 MB. Each diagram shows the curve for the inner loop (IL) and for 
directed mutation (DM). Each data point represents the mean of 50 repeated 
experiments 4 with the given mutation rate, but different random seeds; the error 
bars give the 95% confidence interval. 

Except for the 20-node network, directed mutation always yields lower max- 
imum load than the inner loop strategy. For the 20-node network, the difference 
is not significant for mutation rates above 0.04. For mutation rates greater than 
0.02, directed mutation always leads to much higher cache hit ratios and requires 
therefore much less CPU time. Directed mutation not only yields better results, 
but is also much more efficient in terms of calls to the objective function. The 
latter holds for all three test networks. 

4 For the 20-node network 200 samples were taken. 
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Fig. 2. 11 Node Network with 14700 evaluations. 




(a) (b) (c) 



Fig. 3. 20 Node Network with 31521 evaluations. 




If the incremental algorithms for the objective function are used, different 
calls to the objective function require different amounts of CPU time. We did not 
further analyze this difference, but mention that directed mutation would also 
benefit from incremental algorithms, since 70% of all chromosomes are subject 
only to random and directed mutation. The small number of weight changes in 
such cases make incremental algorithms attractive also for the directed mutation 
strategy. 
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5 Discussion 

The fact that caching with high hit ratios improves CPU performance is fine 
but not very surprising. But cache hit ratios tell us also something about the 
search process. High cache hit ratios clearly indicate that the GA visits many 
points in the search space more than once. But the results for directed mutation 
are never worse, and often significantly better, than other strategies with lower 
cache hit ratios. This implies that simply exploring more potential solutions does 
not increase the quality of the result! 

For the inner loop, we showed in Sec. 3 that many evaluations are wasted by 
exploring nearby local minima. 

Greater random diversification through higher mutation rates also wastes 
evaluations. The reason behind this phenomenon seems to be the error threshold, 
as in biological populations. For asexual (no crossover), biological populations 
the error threshold for the mutation rate is well known [18] [19]. Mutation is 
essentially an error occurring during the “transmission” of a chromosome to the 
next generation. Simply stated, if the mutation rate for a population is too low, 
the population evolves slowly. If the mutation rate is above the error threshold, 
the population dies out. The optimal value for the mutation rate is just below 
the error threshold, to evolve as quickly as possible. Although the population 
cannot die out in genetic algorithms by design since the population size is fixed, 
the error threshold remains effective in the sense that there is a mutation rate 
above which results become worse. 

On the other hand, we find it strange that the best of the investigated strate- 
gies creates each chromosome about two times, which also wastes half the evalu- 
ations. While caching alleviates this effect and leads to better CPU performance, 
we find it hard to believe that a strategy which creates each potential solution 
twice should be optimal. 



6 Conclusion 

This work resumed genetic algorithms in link weight optimization and a local 
heuristic which increments the weight of the most loaded link. We analyzed the 
current implementation strategy of the heuristic and found that it is still ex- 
pensive in terms of calls to the objective function. Based on the analysis, an 
implementation strategy called directed mutation has been proposed and evalu- 
ated together with caching. The new approach outperform the current strategy 
both in terms of the quality of the result and in terms of calls to the objective 
function, i.e., CPU usage. 

There is still room for further improvements of this approach. For example, 
using the cache as a tabu list and not only to return the known result from the 
cache, but to use the tabu list to really create new chromosomes might even 
further improve performance for optimizations in this area. 
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Abstract. In optical transport networks algorithms dealing with the lightpath 
selection process select routes and assign wavelengths based on the routing in- 
formation obtained from the network state databases. Unfortunately, due to 
some factors, in large dynamic networks this routing information may be non- 
accurate enough to provide successful routing decisions. In this paper we sug- 
gest a new prediction-based routing mechanism where lightpaths are selected 
based on prediction decisions. Consequently, the routing information is not re- 
quired at all, so updating this information is neither required. In short, the sig- 
naling overhead produced by the updating process is practically removed. 

Keywords: Optical routing, routing inaccuracy, prediction-based routing. 



1 Introduction 

Internet traffic demands are extensively growing in the last years due to the real time 
applications such as video, multimedia conferences or virtual reality. Optical wave- 
length-division multiplexing (WDM) networks are able to provide great bandwidth to 
support this growing traffic demands. Unlike traditional IP networks where the rout- 
ing process only involves a physical path selection, in WDM networks the routing 
process involves both a physical path selection and a wavelength assignment, i.e., the 
routing and wavelength assignment (RWA) problem. The RWA problem is often tack- 
led by being divided into two different sub-problems, the routing sub-problem and the 
wavelength assignment sub-problem. The first approach to the routing sub-problem in 
a WDM network focuses on always selecting the same route between each source- 
destination node pair, known as static routing. This route is calculated for example in 
the Fixed-shortest path, by means of the Dijkstra’s [1] algorithm or the Bellman- 
Ford’s algorithm. However, since the performance of the fixed-shortest path algo- 
rithm is limited, the Fixed- Alternate routing is proposed [2], According to this, more 
than one fixed route is calculated for every source-destination node pair. For each 
new connection request the routing algorithm tries to send the traffic through the 
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calculated fixed routes in sequence. This solution substantially reduces the number of 
connection blocked respect the fixed-shortest path. 

The main problem of the static routing is that it does not consider the current net- 
work state. Hence, the second approach for the routing sub-problem in WDM net- 
works is the dynamic (or adaptive) routing, which selects routes based on the current 
network state. There are different approaches for this scenario, such as the adaptive 
shortest-cost-path routing and the Least-Congested Path algorithm, LCP [3]. In short, 
in spite of the fact that LCP performs better than Fixed-Alternate routing, it is worth 
noting that in adaptive routing source nodes require of continuously receiving update 
messages about the changes in the network state. 

The static wavelength assignment sub-problem consists in given a set of estab- 
lished routes for a set of lightpaths, to assign the wavelength to each route. In this 
paper we focus on Wavelength Selective (ITS) networks, that is, networks without 
wavelength conversion capabilities. The main restriction in WS networks is that 
routes sharing the same link (or links) must have different wavelengths, i.e., the same 
wavelength must be assigned to the lightpath on all the links in its route. 

If connection requests arrive by an incremental or dynamic traffic, heuristic meth- 
ods are used to assign wavelength to the lightpaths. In this case the number of avail- 
able wavelength is supposed to be fixed. A large number of different heuristic algo- 
rithms have been proposed in the literature as shown in [4], such as Random, First-Fit, 
Least-Used, Most-Used, Min-Product, Least-Loaded, Max-Sum, Relative Capacity 
Loss, Protecting Threshold, and Distributed Capacity Loss 

Most RWA solutions proposed in the recent literature use distributed mechanisms 
based on source-routing. In this scenario the routing inaccuracy problem (RIP) comes 
up. The RIP describes the impact on global network performance because of taking 
RWA decisions according to inaccurate routing information. In highly dynamic net- 
works, inaccuracy is mainly due to the restriction of aggregating routing information 
in the update messages, the frequency of updating the network state databases and the 
latency associated with the flooding process. It is worth noting that two factors are 
negative collateral effects of their inclusion to reduce the signaling overhead produced 
by the large amount of update messages required to keep accurate routing informa- 
tion. It has been clearly shown [5] that the routing inaccuracy problem, that is, to 
select a path based on outdated network state information, may significantly impact 
on global network performance significantly increasing the number of blocked con- 
nection requests. 

In this paper we propose the prediction-based routing as a mechanism that does not 
only address the RWA problem but also the RIP, achieving a drastic reduction in the 
signalling overhead. In short, the prediction-based routing mechanism selects routes 
not based on the 'old' or inaccurate network state information but based on some kind 
of ‘predicted' information. Hence, since routing information from network state data- 
bases is not required, we may eliminate the need of Hooding update messages (except 
those required for connectivity). 

The remainder of this paper is organized as follows. Section 2 reviews main sig- 
nificant contributions existing in the recent literature about the Routing Inaccuracy 
Problem. Then, Section 3 proposes the Predictive Routing Algorithm, Section 4 
evaluates our proposal and finally, Section 5 concludes the paper. 
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2 Handling the Routing Inaccuracy Problem 

Most of the Dynamic RWA algorithms assume that the network state databases 
(named Traffic Engineering Databases, TEDs when including QoS attributes) contain 
accurate information of the current network state. Unfortunately, when this informa- 
tion is not perfectly updated routing decisions can be wrongly performed at the source 
nodes producing a significant connection blocking increment (i.e., the routing inaccu- 
racy problem). Most recent related work is summarized in the following paragraphs. 

In [5] the effects produced in the blocking probability because of having inaccurate 
routing information when selecting lightpaths are shown by simulation. The authors 
indeed verify over a fixed topology that the blocking ratio increases when routing is 
done under inaccurate routing information. The routing uncertainty is introduced by 
adding an update interval of 10 seconds. Some other simulations are performed to 
show the effects on the blocking ratio due to changing the number of fibers on all the 
links. Finally, the authors argue that new Routing and Wavelength Assignment (RWA) 
algorithms that can tolerate imprecise global network state information must be de- 
veloped for dynamic connection management in WDM networks. 

In [6] the routing inaccuracy problem is addressed by modifying the lightpath con- 
trol mechanism, and a new distributed lightpath control based on destination routing 
is suggested. The mechanism is based on both selecting the physical route and wave- 
length on the destination node, and adding rerouting capabilities to the intermediate 
nodes to avoid blocking a connection when the selected wavelength is no longer 
available at set-up time in any intermediate node along the lightpath. There are two 
main weaknesses of this mechanism. Firstly, since the rerouting is performed in real 
time in the set-up process, wavelength usage deterioration is directly proportional to 
the number of intermediate nodes that must reroute the traffic. Secondly, the signaling 
overhead is not reduced, since the RWA decision is based on the global network state 
information maintained on the destination node, which must be perfectly updated. 

Another contribution on this topic can be found in [7] where authors propose a 
mechanism whose goal is to control the amount of signaling messages flooded 
throughout the network. Assuming that update messages are sent according to a hold- 
down timer regardless of frequency of network state changes, authors propose a dy- 
namic distributed bucket-based Shared Path Protection scheme (an extension of the 
Shared Path Protection, SPP scheme). Therefore, the amount of signaling overhead is 
limited by both fixing a constant hold-down timer which effectively limits the number 
of update messages flooded throughout the network and using buckets which effec- 
tively limits the amount of information stored on the source node, i.e. the amount of 
information to be flooded by nodes. The effects of the introduced inaccuracy are han- 
dled by computing alternative disjoint lightpaths which will act as a protection light- 
paths when resources in the working path are not enough to cope with those required 
by the incoming connection. Authors show by simulation that inaccurate database 
information strongly impacts on the connection blocking. This increase in the connec- 
tion blocking may be limited by properly introducing the suitable frequency of update 
messages. According to the authors, simulation results obtained when applying the 
proposed scheme along with a modified version of the OSPF protocol, may help net- 
work operators to determine that frequency of update messages which better main- 
tains a trade-off between the connection blocking and the signaling overhead. 
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In [8] authors propose a new adaptive source routing mechanism named BYPASS 
Based Optical Routing ( BBOR ), aiming to reduce the routing inaccuracy effects, i.e., 
blocking probability increment and non-optimal path selection, in WS networks. In 
[9] authors extend the mechanism to be applied to networks with conversion capabili- 
ties. The BBOR mechanism is based on bypassing those links which cannot forward 
the setup message because of lacking the selected wavelength. The bypass is achieved 
by forwarding the setup messages through a previously precomputed alternative path 
(bypass-path). 



3 New Proposal of Prediction-Based Routing 

The main idea of the Prediction-based Routing ( PBR ) mechanism is based on extend- 
ing the concepts of branch prediction used in computer architecture [10]. In this field, 
there are several methods to predict the direction of the branch instructions. The pre- 
diction of branch instructions is not made knowing exactly the state of the processor 
but knowing the previous branch instructions behavior. The prediction can be either 
wrong or correct but the goal is to maximize the number of correct predictions. Con- 
sidering this idea, the PBR mechanism is based on predicting the route and wave- 
length assignment between two nodes according to the routing information obtained 
in previous connections set-up. Thus, the PBR avoids the use of inaccurate network 
state information obtained from the Traffic Engineering databases, therefore remov- 
ing the need of frequent updating. It is necessary to mention that a minimal updating 
is required to ensure connectivity just reporting about link/node availability. 

The main objective is to optimize the routing algorithm decision, considering the 
state 'history' for each path, that is, every source node must keep previous information 
about both wavelength and route allocated to this path established between itself and a 
destination node This history is repeated all through the time and is stored in a history 
register, which will be used as a pattern of behavior, which is used to train a new 
table, named Prediction Table (PT). 

It must be noticed that in order to generate the history, every source node must 
keep not only the last information but also previous information of the wavelength 
and routes used. With all this information it creates an index which is then used to 
index the PT. This PT, has different entries, each keeping information about a differ- 
ent pattern by means of a counter. The prediction is obtained reading the counter 
value from the table. These counters are updated (increased or decreased) in order to 
learn [10]. 



3.1 Wavelength History Registers 

Before defining a prediction algorithm it is necessary to introduce the parameter used 
to decide when the history registers may be modified. We define indeed a cycle as the 
basic unit of time where the history state is susceptible to be modified. 

As it is mentioned above every source node must know the history state informa- 
tion, and for this reason the history state is kept in history registers. There are one of 
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such registers for every wavelength on every path to every destination node. We name 
these registers as wavelength registers (WR). 

We propose a method to register the history of the network state in every source 
node based on assuming that for each cycle, each WR is updated with a 0 value when 
this wavelength on this path is used on that cycle. Otherwise, the register of an unused 
wavelength on a path is updated with a 1. It must be noticed that the expression “a 
path is used” means that it has been selected by the prediction algorithm and actually 
the decision is right since the path is available. On the other hand, “a path is unused” 
when no incoming connection is assigned to this path. 



3.2 Prediction Tables 

The prediction tables are the base to be able to predict a wavelength and a path. In the 
source nodes one prediction table, PT, is needed for every feasible circuit between 
any source-destination node pair. The prediction table for a wavelength on a path is 
accessed by an index obtained from the corresponding WR. For example, a source 
node sends traffic towards two different destination nodes and every source- 
destination node pair has two different paths (two shortest-paths). Moreover, if we 
assume the existence of 6 wavelengths then 24 PTs are needed on the source node, 
one for every path and wavelength. In every source node there is the same number of 
wavelength registers than of prediction tables. 

Every entry in the prediction tables has a counter, which is read when accessing the 
table. This value is compared to a threshold value. If the value from the table is lower, 
the prediction result is to accept the request through the wavelength on this path. 
Otherwise, the path is predicted to be not available. The counters are two-bit saturat- 
ing counter, where 0 and 1 account for the availability and 2 and 3 accounts for path 
unavailability [10]. The use of two values to account for the availability or the un- 
availability has been well studied in the area of branch prediction. As it is presented in 
[ 10] a two bit counter gives better accuracy than a one bit counter. The use of a one 
bit counter means that it predicts what happened last time. If last time the traffic re- 
quest was blocked and the counter has only one bit, the next time that the history is 
repeated the prediction will be that there will not be availability, or if the traffic was 
accepted last time the prediction will be that there will be availability. On the other 
hand if the counter has two bits it is necessary that the traffic request has been 
blocked (or accepted) two times for the same history to change the direction of the 
prediction. It is also exposed in [10] that going to counters larger than two bits does 
not necessarily give better results. This is due to the “inertia” that can be built up with 
a large counter. In that case more than two changes in the same direction are neces- 
sary to change the prediction. Saturating counter means that when counter has a value 
of 0 and it is decreased its new value is also 0, and when its value is 3 and it is in- 
creased its value remains at 3. 

As explained above, in the source nodes there is one prediction table, PT, for every 
wavelength on every path and for every destination. The tables have to be updated 
with the same index used on the prediction. When a new connection request is set up 
the table of the selected wavelength and path is updated, decreasing the counter. On 
the other hand, when the connection request has been blocked the counter is in- 
creased. The rest of the tables of the unused paths are not updated. Note that when a 
connection request is set up only the prediction table of the wavelength and path used 
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is updated, but all the wavelength registers corresponding to that destination are up- 
dated, of the used with 0 and of the unused wavelengths with 1 . 

It is worth noting that the updating of prediction tables in the source nodes is done 
immediately the prediction is done and it is known if the connection request is set up 
or blocked. For this reason it is not necessary to flood update message through the 
network to update the network state databases. 



3.3 RWA Prediction Algorithm 

We define a new RWA prediction algorithm, Route and Wavelength Prediction, RWP, 
inferred from the PBR mechanism, which utilizes the information contained in the 
prediction tables to decide about which path and wavelength will be selected. When a 
new request arrives at the source node demanding a connection to one destination 
node, all the prediction tables of the corresponding destination are accessed. It must 
be noticed that one prediction table, PT, and one wavelength register, WR, exist for 
every wavelength on every path to every destination. We assume that two shortest 
paths are computed for every source-destination node pair, SPj and SP 2 . Prediction 
tables are accessed by one index per table which is built with the wavelength histories 
contained in the WR. As a consequence of reading the prediction tables, the 2-bit 
counters are obtained. As an example. Fig. 1 shows the accesses to existing PTs for 
the shortest path (either SPj or SP 2 ). In Fig. 2 we can see the RWP flow chart, suppos- 
ing W wavlengths in every link. The RWP algorithm always starts considering the 
value of the counter of the PT of the first wavelength on the shortest path, for instance 
SPj. If the counter is less than 2 (0,1), and this wavelength is free in the node’s outgo- 
ing link towards SP r the prediction algorithm decides to use this wavelength on this 
path. Otherwise (counter=2, 3 or outgoing link not available) this wavelength is not 
used. In this last case, the value of the counter of the next PT is examined. Notice that 
next PT corresponds to the second wavelength on SP r When the counters of the PTs 
of all the wavelengths of SPj have been examined, that is, either the counters always 
are greater than 2 or all wavelengths on the outgoing link towards SPj are not avail- 



index A 1:0101101 
index A, 2: 0101000 

index X i: 1111111 

index An: 0111 100 



PT^ , PT ^ 2 PT^j PT\ 




Fig. 1 . Example of Prediction Table access and values of the 2-bit counters 
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Fig. 2. RWP flow chart 



able, the prediction algorithm checks the PTs of the next path, SP,, and so on. When 
the prediction algorithm, after checking all PTs, decides that all the feasible wave- 
lengths on the two paths are blocked, then it tries to forward the connection request 
through the first available wavelength on the outgoing link towards one of the two 
shortest path either SPj or SP 2 . The information about of the outgoing links of the 
source node is always known by the source node. 

Wavelength registers (WR) are updated depending on which wavelength is used 
and whether the request is blocked or not. Also the prediction table, PT, of the used 
wavelength and path is updated by either increasing (means connection blocked) or 
decreasing (means connection not blocked) the counter of the corresponding entry in 
the PT. 

It is worth noting that counters of every wavelength on all the feasible paths be- 
tween a source-destination node pair can be read, so allowing the prediction to be 
made, before a new connection request reaches the source node. It is a very signifi- 
cant factor which significantly reduces the cost involved with the PBR mechanism. In 
fact, even though several tables must be accessed to make the prediction, these ac- 
cesses can be done offline. For every possible new request, the decision of which path 
to use is already done. 



4 Performance Evaluation 

We have developed a tool to check the Prediction-Based Routing performance. Simu- 
lations are obtained by applying the PBR to a topology test composed by 15 nodes 
and 27 links, with 2 source nodes and 2 destination nodes. All these nodes are con- 
nected by one fiber-links and the number of lambdas is a variable in the range of 2 
and 5. Connection arrivals are modeled by a Poisson distribution and each arrival 
connection requires a full wavelength on each link it traverses. Each WR keeps 
information about the last 5 cycles, 5 bits, so there are 32 entries of 2 bits in each PT. 
In order to show the capacity overhead in terms of bits because of applying the PBR 
we propose as an example the following: we assume that 2 shortest paths, SP1 and 
SP2, are computed with 5 lambdas each, therefore will be 20 PTs in every source 
node. Such a scenario represents a total capacity of 1280 bits, which can perfectly be 
considered as negligible. 

The initial goal is to verify that the RWP can know the network behavior, in terms 
of routing and wavelength assignment, using the prediction tables. We compare the 
performance of both the RWP and First-Fit algorithm. When applying the First-Fit 
algorithm we vary the updating frequency and the number of available wavelengths 
on every fiber. As a nomenclature, we define a cycle as the basic unit of time. Fig. 3 
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Fig. 3. Blocked Connection Requests for Fig. 4. First-Fit versus RWP Algorithm 

the First-Fit Algorithm 



shows the blocking obtained by the First-Fit algorithm, assuming a total number of 
62000 connection requests, when varying the update interval from 1,5,10, 20 and 40 
cycles. The Y-axis in Fig. 3 depicts the number of blocked requests, consisting in both 
those requests rejected at any intermediate node and those requests blocked because 
of lacking resource enough in the path selection process. Fig. 3 also shows the effects 
of varying the number of available wavelengths. We can see that a minimum number 
of blocked requests (1054, that is a 1.7%) is obtained when N=1 (update messages 
every cycle) and the number of lambdas is 5. Fig. 4 shows a comparison between 
RWP and First-Fit algorithm for several lambda values. Analyzing the results, we 
demonstrate that from lambda=4 the RWP behaves better than the First-Fit. There- 
fore, for lambda=4 the result for RWP is of 287 blocked requests and for First-Fit is 
of 1066, and for lambda=5 the results are 56 blocked requests for RWP and 1054 for 
First-Fit. 

There are two origins of blocked requests. The first is produced when there is no 
available path for a connection request. The second occurs when the algorithm fails in 
the route assignment, so that the set-up connection is blocked in an intermediate node. 




Number of Requests 

Fig. 5. Evolution of Blocked Connections Requests for the First-Fit and RWP. 
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The RWP achieves a number of blocked requests less than the First-Fit (e.g. for 
lambda=4) due to the fact that the First-Fit fails more in the route assignment, even 
when the update messages reaches the source node every cycle (N=l). This case oc- 
curs when two connections are requested at two nodes at the same time, one node 
assigns route before the other. Thus, the second node assigns route utilizing network 
information out of date. This case does not happen in the RWP because it has more 
capability of learn which route is the best for each request. In Fig. 5 we present the 
evolution of blocked requests (for lambda=4) every 100 new request since the total 
number of request is 0 to 2000, for both the First-Fit and RWP. Initially the prediction 
algorithm fails more (7 and 0 blocked requests for the first 100 requests for the RWP 
and First-Fit algorithm respectively), then when the number of requests is 1400 the 
number of blocked requests is equal for both algorithms, and for 2000 requests the 
results are 27 and 41 for RWP and First-Fit respectively. We can conclude that the 
prediction algorithm learns about its fails and the slope of rising decreases (logarith- 
mic approximation), but the First-Fit algorithm has a constant rising in the number of 
blocked requests (lineal approximation). 

It is worth noting that we have compared the RWP with the First-Fit algorithm as- 
suming N= 1 . However, it is well known that the signaling overhead involved by this 
updating frequency is non-affordable. Hence, when simulations take into account 
more realistic values, for instance N=40, RWP is still much better than the First-Fit 
algorithm. 



5 Conclusions 

In this paper authors propose the Prediction-Based Routing ( PBR ) mechanism to 
tackle the RWA problem in WDM networks. The main skill of PBR is to provide 
source nodes with the capability of taking routing decisions without using the tradi- 
tional routing information, that is the network state information contained in their 
Traffic Engineering databases (TEDs). Two immediate benefits may be inferred from 
the PBR mechanism. The former, the PBR removes the update messages required to 
update the TEDs (only connectivity messages are required), so significantly reducing 
the signaling overhead. The latter, in highly dynamic networks the PBR can effi- 
ciently change the routing decisions after a training period. Simulation results show 
that the PBR mechanism behaves better than the First-Fit algorithm even when an 
update frequency of 1 cycle is set for the First-Fit. 
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Abstract. The lack of QoS support in the Internet makes it difficult for 
service providers to give guarantees regarding the timely delivery and 
quality of their services. For multimedia services like video on demand 
and video conferencing however, the delay should be minimal. One of 
the main causes of the absence of QoS is the inter- and intradomain 
routing scheme in the Internet, minimizing the hop count instead of 
optimizing the QoS. In this paper we will discuss the construction of an 
overlay network that is able to meet the requirements of time-critical 
services. An optimal algorithm, for the placement of overlay servers in 
the Internet, will be described together with a number of heuristics. 
These server placement algorithms will be evaluated by comparing the 
resulting overlay network to the standard Internet in terms of number 
of connections accepted and number of overlay servers required. 

Keywords: QoS, Overlay Network, Server Placement, Integer Linear 
Programming. 



1 Introduction 

The usage of the Internet is evolving from html and file traffic towards advanced 
multimedia service delivery. The current infrastructure lacks the ability to pro- 
vide the QoS required by these services. Applications like video on demand and 
video conferencing are characterized by both a significant need for bandwidth 
coupled with minimal delay. These parameters are not taken into account by the 
standard Internet routing algorithms, which select the shortest path in terms 
of hop count. As a result, the Internet doesn’t offer any guarantees regarding 
the delay/bandwidth on a path. Other reasons for the lack of QoS support are 
the inability of the Internet to route around congested links and the fact that 
autonomous systems (ASs) often eject packets destined for other ASs early to 
minimize the load on their own network, regardless of the effect this might have 
on the delay [1] . We propose the use of overlay network technology to overcome 
these problems. An overlay network is a network built on top of an existing net- 
work. Overlay networks facilitate the introduction of new network functionality 
whilst keeping the underlying network unchanged. Examples of the use of over- 
lay networks include MBone [2] and 6Bone [3] . There has already been research 



J. Sole-Pareta et al. (Eds.): QofIS 2004, LNCS 3266, pp. 164—173, 2004. 
(c) Springer- Verlag Berlin Heidelberg 2004 




On the Construction of QoS Enabled Overlay Networks 165 



on the possibilities of allowing alternative route selection via an overlay net- 
work. In [4] the authors discuss an overlay network that is able to dynamically 
react to path outages and [5] discusses QoS aware routing algorithms for overlay 
networks. 

An overlay network can be used to route traffic by sending the data between 
overlay servers. Monitoring the condition of the overlay links allows an overlay 
network to get information regarding the state of the Internet and to dynam- 
ically react to link congestion or quality degradation by sending its traffic via 
routes that still fulfill the QoS demands of a connection. Essential to the func- 
tioning of such an overlay network is the location of the different overlay servers. 
Good server placement algorithms will greatly increase the quality of the over- 
lay networks. In [6] the server placement problem for overlay networks is studied 
by determining locations of servers such that the distance of every client to its 
nearest overlay server is minimized. Here however, we argue that an important 
aspect of QoS degradation is situated in the core network. We will thus study 
algorithms that place the overlay servers on a best effort network in such a way 
that connections will have end-to-end paths that fulfill both their bandwidth 
and their delay requirements. 

Fig. 1 illustrates the use of an overlay network to provide a route selection 
infrastructure. We see two clients in a small IP network, consisting of five routers. 
The delay on every link is shown and if standard IP routing is used, the delay 
between the routers 1 and 2 will be 200 (situation (a)). However, by deploying 
an overlay server near router 4 the traffic can be redirected. Thus the route that 
is chosen for the traffic between the clients will have a much lower delay of 30 
(situation (b)). 




Fig. 1 . Routing via an overlay network 



This paper is organized as follows: section 2 gives a full description of the 
problem we want to solve and discusses an optimal server placement algorithm 
and a number of heuristics. In section 3 we present the evaluation of the different 
algorithms. Conclusions are drawn in section 4 and future work is addressed in 
section 5. As multimedia services are often characterized by a multicast nature 
and overlay networks are also able to provide multicast functionality [7] , we have 
evaluated the algorithms for both the unicast and the multicast case. 
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2 Algorithms 

The goal of the algorithms is finding the location of a number of overlay servers, 
such that the resulting overlay network supports as many connections as possible 
from a given set of requested connections, guaranteeing connection bandwidth 
and a bounded end-to-end delay. By formulating the problem as an Integer Lin- 
ear Programming (ILP) problem, we can find an optimal solution using standard 
techniques [8]. For completeness we developed a formulation for the multicast 
problem, where all connections can have multiple destinations. The unicast for- 
mulation can be easily derived from the multicast case. The solution will not 
only determine the location of the servers, but will also determine the routes 
that are followed by the different connections. These routes will thus load bal- 
ance the network, as they are chosen in such a way that as much connections as 
possible can be supported. We also designed three heuristics for determining a 
server placement. 

2.1 Multicast ILP Solution 

Input. Following notations were introduced: G/p = (V,A/p) denotes the di- 
rected graph of the IP network with set of vertices V and set of arcs Ajp. 
Gov = (V, Aov) is used to denote the full mesh overlay network constructed on 
top of the IP network. The nodes in the IP network represent routers and the 
arcs IP links connecting two routers. The arcs in the overlay graph are mapped 
to routes in the IP network, for the delay of an overlay arc, we took the sum 
of the delays of the IP arcs on the corresponding route. The delay on an IP 
arc was assumed to be constant. The delay of an overlay arc aov is denoted by 
DELAY (aov)- For every arc a/p in the IP graph, there is a set U (a/p), contain- 
ing all the overlay arcs using a/p. Every IP arc a/p has an available bandwidth, 
denoted by BW (a/p). For every node v of the overlay network there is a set 
in (v) with all the incoming arcs and a set out (v) with all the outgoing arcs for 
that node. 

The multicast connections that we want to be able to support are bundled 
in a set C. A multicast connection c £ C has a source vertex S (c) and a set of 
target vertices T (c). It also has a required bandwidth BW (c) and a maximal 
delay DELAY (c). A multicast connection can be seen as originating from a 
client located behind the source router and destined to clients located behind 
the destination routers. We assumed that an overlay server can be connected 
to every router in the IP-network. Furthermore, we assumed that the traffic 
bottleneck is in the actual IP-network and therefore do not take into account 
the delay /band width restrictions for the links connecting the overlay servers and 
clients to the routers. 

We also included the possibility to limit the total number of servers used, 
this number should not exceed the value of n, a parameter of the algorithm. 

ILP Problem formulation. Following decision variables have been intro- 
duced: 
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— a c : an a c variable equals 1 if a connection c is supported by the overlay 
network. 

— x v : an x v variable equals 1 if there is an overlay server connected to router 

v. 

— Va,c : these binary variables describe the multicast tree for a connection c. If 
Va,c equals 1, the overlay link a is used to connect the source node of c to a 
target node of c. If a connection c is not supported by the overlay, the values 
of y a ,c equal 0 for every overlay link a. 

— 2 0jC ,t: these variables determine the end-to-end path for a target node t of a 
connection c, so if an overlay arc a is used for connecting the source node to 
a target node t of a connection c, the z a ^ c t variable equals 1. 

The ILP formulation contains a set of constraints specifying the requirements 
on the solution and an objective function, describing the goal. A first set of 
constraints are the flow conservation constraints, they determine all the paths 
connecting the source of a connection to every target node of that connection, 
for all the connections. Constraints (1) and (2) make sure that a source node can 
only send traffic if the connection is supported and that a source node never has 
incoming links in an end-to-end path. Constraints (3) and (4) enforce analogous 
constraints on the target nodes of the connection. (5) states that an intermediary 
node only forwards traffic if it receives traffic. 



Vc G C, Vt G T (c) : 


^ ^ Za,c,t — CL c 

a£out(S(c)) 


(i) 


Vc G C, Vt G T (c) : 


^ ^ Za,c,t = 0 

a£in(S (c)) 
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A next set of constraints bundles the variables describing the end-to-end 
paths (z variables) in the variables describing a multicast tree for every connec- 
tion ( y variables) . This is done by letting an arc be present in the tree if there is 
a source-target path that crosses that arc (6) and making sure that no arcs that 
are not used in the end-to-end paths are present in the tree (7) . Constraint (8) 
states that there is only one incoming arc for every node in the overlay, this is 
done to enforce the tree property on the y variables. 



Vc G C, Vt G T (c) , Va G Aqv '■ z a ,c,t < Va,c 



( 6 ) 
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The following constraints are used to enforce the QoS requirements of the 
connections and to take into account the capacity limitations of the arcs. Con- 
straint (9) makes sure that on every IP link there is no more bandwidth consumed 
than is available and (10) states that the end-to-end delay from the source to 
every destination must be lower than the required maximal delay. 

Va/p G A l p : Y Y (w«,c x BW ^ BW ( a ^) ( 9 ) 

cGC ciov €.U(ai p) 

Vc £ C, \/t. £ T (c) : Y ( z a,c,t X DELAY (a)) < DELAY (c) (10) 

a^Aov 

A last set of constraints are the server presence constraints. They determine 
where the servers for enabling the multicast trees have to be placed and capture 
that the number of servers that are placed is limited to n. Constraint (11) makes 
sure that a node is present in the overlay if it has to forward traffic in the 
multicast tree of a connection. 

Vc £ C, Vv £ V\ (S (c)) , Va £ out (v) : x v > y a , c (11) 

Y Xv — n (12) 

vev 

The objective function will describe the goal of the ILP problem. As we want 
to maximize the number of connections that is supported by the overlay network, 
the model will minimize the following function: 

a x x v — (3 x I> (13) 

vev cec 

The a and (3 parameters will allow to adapt the relative importance of the 
number of servers placed in comparison to the number of connections that are 
supported. 

Solution. To solve the ILP problem, we made use of the dual simplex method 
as described in [8]. 

2.2 Heuristics 

Random Heuristic. This heuristic determines an arbitrary server placement. 
To do this n nodes are selected at random from the collection of nodes and 
overlay servers are placed at those nodes. This heuristic is only used for reference 
purposes. 
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Best Path Heuristic (BP). First this heuristic calculates the end-to-end 
minimal delay path between every pair of nodes in the network. All the nodes 
of the network are then ranked according to their occurence in these paths. The 
heuristic then places n overlay servers in the top n nodes of the network. This 
heuristic chooses the nodes that are on the most minimal delay paths. 



Minimal Servers Per Path Heuristic (MSPP). Using the optimal algo- 
rithm, this heuristic determines a path for every pair of nodes in the network. 
The requirements, in terms of delay and bandwidth, associated with each of 
those connections are chosen in such a way that they show the same behavior as 
the connections we expect the overlay network to support. The resulting paths 
have a bounded delay and use a minimal number of overlay servers. The nodes 
of the network are then ranked according to their presence as overlay servers in 
these paths and the n nodes that occur the most are selected as locations for 
overlay servers. 



3 Evaluation 

We have evaluated the algorithms for both the unicast and the multicast case. 
We used a 16-node network with 46 directed arcs, this network is shown in 
fig. 2. The bandwidth on the arcs was uniformly distributed in [10, 40] and the 
delay on the arcs was uniformly distributed in [10,80]. We generated sets of 
connections and used the algorithms to determine the servers needed for those 
sets of connections with the different algorithms. The results shown are averaged 
over a number of iterations. As a benchmark, we also calculated the maximal 
number of connections that can be supported with both standard IP and IP 
Multicast [9]. To evaluate the different algorithms, we used an ILP formulation 
and the dual simplex method to determine the maximal number of connections 
that can be set up, given a certain server placement for the algorithms and 
given the IP topology for the IP cases. In the tests it was assumed that standard 
routing in the Internet follows the shortest paths between the two end points and 
that the multicast trees setup in IP multicast follow the shortest paths for every 
connection. In every test, background traffic was not taken into account. The 
values of the a and (3 parameters were chosen in such a way that the number of 
connections supported was more important than the number of servers needed. 



3.1 Unicast 

To generate a unicast connection, we chose 2 nodes of the network, a source node 
and a target node, all nodes had equal probability to be chosen. The bandwidth 
required by every connection was 5 Mbit/s and for the delay bound we calculated 
the minimal delay between the end nodes and added 15 percent to that value. 

In a first test we executed the optimal algorithm for increasing numbers of 
connections. The maximal number of servers that could be placed was 4. For 
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Fig. 2. The network on which the tests were executed 



every number of connections, 250 sets were generated. For every set of connec- 
tions, we also determined the maximal number of connections that could be 
supported simultaneously with standard IP routing. In fig. 3, the acceptance 
rate for the overlay is compared to the acceptance rate of a standard best effort 
network and the number of servers needed to form the overlay is shown. The 
results clearly show the advantage of using an overlay network over the standard 
Internet. As the size of the set of connections gets larger, we see a decrease in 
the acceptance rate for both the overlay network and standard IP. This is a 
result of congestion. As more connections have to be routed, some links in the 
network will get congested, resulting in a lower acceptance rate. We also see that 
the number of servers that is needed to form the overlay increases as the size of 
the set increases. The reason for this is twofold, as more connections have to be 
supported, some links get congested and we need overlay servers to route around 
those congested links. Another reason is that the overlay network needs to cover 
the whole network as more connections are supported. 





Fig. 3. Average acceptance rate and average number of servers needed for optimal 
overlay network 



In a second test we evaluated the heuristic algorithms. The number of itera- 
tions for this test was 50. The heuristics were used to determine the locations of 
5 overlay servers. Fig. 4 illustrates the performance of the resulting overlays by 
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comparing them to each other and to standard IP. We see that all the heuristics 
result in an overlay network with a far greater acceptance rate. The gain rela- 
tive to the standard IP approach is also significant. As the number of requested 
connections is increased, the gain will decrease up until a point where it seems 
to stagnate. This behavior can be explained as follows: as the number of con- 
nections increases, the overlay network will congest the links that are good in 
terms of delay. The IP network however will spread its load more equally over 
the total collection of links. However, from a certain point onwards, both the IP 
and the overlay approach will have a lot of congested links. Then the overlay 
has still got a higher acceptance rate by routing around these links. The random 
heuristic is clearly outperformed by the more intelligent heuristics, proving that 
determining the location of the overlay servers is not trivial. 





Fig. 4. Average acceptance rate and average achieved gain relative to IP of heuristics 



In a last figure we will show the effect of an increased number of servers on 
the acceptance rate of the requested connections. The heuristic we used was the 
MSPP heuristic and the number of iterations was again 50. Fig. 5 clearly shows 
that an increased number of servers will result in a higher number of accepted 
connections. It is also important to observe that even one overlay server will 
result in a much higher acceptance rate. 

3.2 Multicast 

To generate a multicast connection with k client nodes, a node is chosen at 
random from the network, this is the source node. From the remaining nodes, k 
nodes are chosen, these are the target nodes of the multicast connection. All the 
multicast connections required a bandwidth of 5 Mbit/s and for the delay we 
calculated the minimal delay between the source node and every target node and 
we added 15 percent to the maximum of those delays. A multicast connection 
is accepted by the network if all the destinations are reached within the delay 
bound. 

Fig. 6 shows the acceptance rate of the different heuristics and IP. The num- 
ber of clients of a connection was put to 3 and 5 servers were placed by the 
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Fig. 5. Average acceptance rate for different numbers of servers with MSPP heuristic 



algorithms. We have also indicated the expected behavior of the ILP algorithm. 
Due to the limited scalability of the ILP approach, it was not possible to per- 
form tests with large number of connections for that algorithm. We see that the 
performance of the overlay network will be far higher than that of the standard 
Internet. This is of course a result of the ability of the overlay network to offer 
alternative routes to the connections. When comparing the unicast case to the 
multicast case, we see that the acceptance rate of IP unicast is a bit higher than 
that of IP multicast. This is because IP unicast will congest more links. As the 
number of connections that is asked to the network increases, the acceptance 
rate will decrease as a result of the congestion of the network. This decrease of 
the acceptance rate is also faster than with the unicast case, this is explained by 
the fact that a multicast connection will have 3 clients instead of 1. Although 
the overlay network can make use of its multicast functionality, the load on 
the network will still be higher than in the unicast case, resulting in a faster 
consumption of the available bandwidth. We also point out that the intelligent 
heuristics again outperform the random placement. The influence of the number 
of servers used is also shown in fig. 6, here we can draw the same conclusions as 
in the unicast case. 





Fig. 6. Average acceptance rate in function of the number of requested connections for 
the different heuristics and for different numbers of overlay servers. 
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4 Conclusion 

We have developed and evaluated several server placement algorithms for over- 
lay networks. The results in this paper clearly prove the possibilities of overlay 
networks to deliver QoS for multimedia applications. For both the unicast and 
the multicast case, the achievable gain by using overlay networks is significant. 
Overlay networks can thus be used to provide an infrastructure that will offer 
guaranteed bounded delay to multimedia connections. Our results also show that 
intelligently placing the overlay servers will increase the functionality of the over- 
lay network, the BP and MSPP heuristics outperform a random placement. It 
was also shown that the number of overlay servers deployed increases the overlay 
network performance. 

5 Future Work 

In next papers we will discuss mechanisms that make the ILP-model more scal- 
able by intelligently pruning overlay edges and overlay nodes on a per connection 
basis. This will allow us to test the performance of overlay networks for larger 
topologies. We will also look at routing algorithms for overlay networks that give 
us the ability to dynamically react to QoS degradation in the Internet. 
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Abstract. Resilience is becoming a key design issue for future IP-based 
networks having a growing commercial importance. In the case of 
element failures the networks have to reconfigure in the order of a 
few hundred milliseconds, i.e. much faster than provided by the slow 
rerouting of current implementations. Several multi-path extensions 
to IP and timer modifcations have been recently proposed providing 
interesting alternatives to the usage of of MPLS below IP. In this paper 
these approaches are first described in a common context and then 
compared by simulations using very detailed simulation models. As 
one of the main results it can be shown that an accelerated update 
of the internal forwarding tables in the nodes together with fast 
hardware-based failure detection are the most promising measures for 
reaching the required reconfiguration time orders. 

Keywords: Resilience, OSPF, MPLS, Simulation. 



1 Introduction 

The current situation of the Internet is marked by the development and intro- 
duction of new real-time connection-oriented services like streaming technologies 
and mission-critical transaction-oriented services. Therefore, the Internet is gain- 
ing more and more importance for the economic success of single companies as 
well as of whole countries and network resilience is becoming a key issue in the 
design of IP based networks. 

Originally, IP routing had been designed to be robust, i.e. to be able to re- 
establish connectivity after almost any failure of network elements. However, 
the applications mentioned only allow service interruptions on the order of a few 
hundred milliseconds - a time frame that cannot be reached by today’s robust 
routing protocols. Therefore, several extensions and modifications have been pro- 
posed recently for speeding up IP protection performance: e.g. a simple reduction 
of the most important routing timer values or the large-scale introduction of IP 
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multi-path operation with a fast local reaction to network element failures. In- 
creasingly, network operators also deploy a designated MPLS layer below the IP 
layer having its own rather fast recovery mechanisms and providing failure-proof 
virtual links to the IP layer. 

The most important aspect in the comparison of all these approaches is the 
resulting recovery speed. In order to thoroughly investigate the time-oriented 
behaviour of the alternatives we developed very detailed simulation models of the 
corresponding router/switclr nodes. We implemented the single state machines 
and timing constants as extensions to the basic MPLS and OSPF models of the 
well-known Internet protocol simulation tool NS-2 [1], The resulting simulator 
then was integrated into a very comfortable tool chain that allows the flexible 
selection of network topologies, traffic demands and protection mechanisms. 

The rest of this paper is organized as follows: section 2 first describes MPLS 
and OSPF starting with MPLS basics and the two most interesting MPLS re- 
covery mechanisms. This is followed by the description of the basic mechanisms 
of OSPF, the main time constants that were considered in the simulator, and the 
proposed extensions for faster reaction. In section 3 we describe the simulation 
framework, the enhancements implemented in the common public domain sim- 
ulator NS-2 and the resulting tool chain. Section 4 details on the measurements 
we ran on the selected network topology and discusses the results obtained. Con- 
clusions and recommendations for future hardware and protocol generations are 
given in section 5. 

2 Resilience Mechanisms 

2.1 Multiprotocol Label Switching (MPLS) 

Label Switching. The routing in IP networks is destination-based: routers 
take their forwarding decisions only according to the destination address of a 
packet. Therefore, routing tables are huge and the rerouting process takes a cor- 
respondig amount of time. With Multiprotocol Label Switching (MPLS) ingress 
routers add labels to packets. These labels are interpreted by transient routers 
known as Label Switching Routers (LSR) as connection identifiers and form the 
basis for their forwarding decision. Each LSR re-labels and switches incoming 
packets according to its forwarding table. Label Switching speeds up the packet 
forwarding, and offers new efficient and quick resilience mechanisms. The setup 
of a MPLS path consists in the establishment of a sequence of labels, called La- 
bel Switched Path (LSP) that the packet will follow through the network. This 
can be simply done using conventional routing algorithms. But the main advan- 
tage of Label Switching appears when the forwarding decision takes the Quality 
of Service or links reservation into consideration. Then more complicated rout- 
ing algorithms have to be used in order to offer the most efficient usage of the 
network. 

MPLS Recovery. MPLS Recovery methods provide alternative LSPs to which 
the traffic can be switched in case of a failure. We must distinguish two types 
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of recovery mechanisms: protection Switching and Restoration. The former in- 
cludes recovery methods where a protection LSP is pre-calculated, just needing 
a switching of all traffic from the working LSP to the backup LSP after the fail- 
ure detection. In the latter case, the backup LSP is calculated dynamically after 
the detection. Another way to classify these recovery mechanisms depends on 
which router along the LSP takes the rerouting decision: it can be done locally, 
the node detecting a failure immediately switching the traffic from the work- 
ing to the backup LSP, or globally when the failure is notified to upstream and 
downstream LSRs that reroute the traffic. This paper will focus on Protection 
Switching schemes. Hereby Link Protection, similar to Cisco’s Fast Reroute, and 
the mechanism introduced by Haskin [2] are considered further. 

Link Protection provides a shortest backup path for each link of the primary 
LSP. When a failure occurs on a protected link, the backup path replaces the 
failed link in the LSP: the upstream router redirects incoming traffic onto the 
backup path and as soon as traffic arrives on the router downstream of the 
failed link it will use the primary LSP again. The Haskin scheme uses a global 
backup path for the LSP from ingress to egress router. When a failure occurs 
on a protected link the upstream router redirects incoming traffic back to the 
ingress router, which will be advertised that a failure has occurred. Then these 
packets are forwarded on the backup path and reach the egress router. 





Fig. 1. MPLS recovery mechanisms 



Routes distribution. There are several possible algorithms to distribute labels 
through the network such as the Label Distribution Protocol (LDP), extended 
for Constraint-based Routing (CR-LDP). Another way is to distribute labels by 
piggybacking them onto other protocols, in particular the Reservation Protocol 
(RSVP) and its Traffic Engineering extension (RSVP-TE [3]). 

2.2 OSPF 

Today, one of the most common intra-domain routing protocols in IP networks 
is OSPF. This section shortly describes the OSPF mechanisms relevant for an 
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understanding of the general behaviour and the various processing times and 
timers. 



Basic OSPF mechanisms. The Hello protocol is used for the detection of 
topology changes. Each router periodically emits Hello packets on all its outgoing 
interfaces. If a router has not received Hello packets from an adjacent router 
within the “Router Dead Interval” , the link between the two routers is considered 
down. When a topology change is detected, the information is broadcasted to 
neighbours via Link State Advertisements (LSA) . 

Each router maintains a complete view of the OSPF area, stored as a LSA 
Database. Each LSA represents one link of the network, and adjacent routers 
exchange bundles of LSAs to synchronise their databases. When a new LSA is 
received the database is updated and the information is broadcasted on outgoing 
interfaces. 

Routes calculation: configurable cost values are associated to each link. Each 
router then calculates a complete shortest path tree 1 . However, only the next 
hop is used for the forwarding process. 

The Forwarding Information Base (FIB) of a router determines which inter- 
face has to be used to forward a packet. After each computation of routes, the 
FIB must be reconfigured. 



Main time constants. Considering the previous mechanisms, the convergence 
behaviour of OSPF in case of a failure can be divided into steps as follows : 
detection of the failure 2 , then flooding of LSAs and - at the same time - schedul- 
ing of a SPF calculation, and launching a FIB update. Table 1 lists these times 
along with their typical values. 



Proposed extensions to OSPF. Considering the standardized values, the 
OSPF protocol needs at least a few seconds to converge. To accelerate the con- 
vergence time, it is proposed to investigate the following two options: reduce 
delays, and associate multipath routing with local failure reaction. In the last 
years, there were several proposals [8,9] to accelerate OSPF convergence time 
by reducing the main timers : T sp f Delay and T sp fHoid set to 0, and sub-second 
THeiio or hardware failure detection. These accelerated variants of OSPF will be 
refered to in the following sections as O S P F^f lo when only sub-second hellos 
are used, and OSPFf££ d when hardware detection is enabled in addition. A new 
approach, proposed in [10] is to associate multipath routing with local failure 
reaction. This would allow to reduce the impact of a link failure by continuing to 
send traffic on the remaining paths. The OSPF standard [11] already allows to 
use paths with equal costs 3 simultaneously. In practice it is not straightforward to 
find link cost assignments yielding equal cost for several paths [12]. [10] presents 

1 Shortest Path First (SPF) calculation 

2 by expiration of the Router Dead Interval or by reception of a new LSA 

3 Equal Cost Multi-Path (ECMP) 
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Table 1. Main time constants in OSPF 



Name 


Typically 


Short Description 


THello 


10s [4] 


Interval between successive Hello packets 


T]Jead 


4 X THello [5,6] 


Router Dead Interval 


T sp f 


O(n.logn) 0(n 2 ) (a) 


SPF calculation 


T S pf Delay 


5s [5,6] 


Minimum time between LSA reception and start of 
SPF computation 


TspfHold 


10s [5,6] 


Minimum time between consecutive SPF computations 


Tl sa 


0.6-1. 1ms [7] 


Process LSA : check if LSA is new and update LSA 
database 


TlsaFlood 


33ms [7] 


LSA flooding time : process LSA, bundle LSAs and 
pacing timer 


Tf ib 


100-300ms [7] 


Update the FIB : from end of LSA processing to end 
of new routes installation 



[(a)1 2.53 x 10 6 n 2 — 1.25 x 10 5 n + 0.0012, where n is the number of routers in the 
area, for details see [7] 



a new routing scheme which provides each node in the network with two or 
more outgoing links towards every destination. Two or more possible next hops 
are then used at each router towards any destination instead of OSPF’s single 
next hop. In [10] such paths are called hammocks , due to their general structure 
where the multiple outgoing paths at one node may recombine at other nodes. 
The routing algorithms for calculating the hammocks where designed in order 
to fulfill the following criteria: 

1. The algorithm must propose at least two outgoing links for every node, 

2. if the topology is such as it is impossible to fulfill the first requirement, the 
algorithm should minimize the number of excpetions, 

3. the algorithm must provide loop-free routing, 

4. and no “single point of failure” 4 , 

5. it should minimize the maximum path length. 

A router detecting a link or port failure can then react locally, immediately 
rerouting the affected traffic over the remaining next hops. This local mechanism 
avoids the time-consuming SPF calculation and flooding of LSAs in the entire 
area in the case of a single link failure. However, if multiple link failures occur and 
there is no remaining alternative link at a router, the local reaction will trigger 
a standard OSPF reaction. This multipath variant of OSPF will be refered to 
in the following sections as O S P F^™ mock and OSPF^™ mock , depending on 
which detection mechanism is used. 

4 Such a node would prevent at least one other node from reaching a destination if it 
fails 
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Fig. 2. Tool chain 



3 Simulation Framework 

In order to investigate the recovery performances of OSPF and MPLS, a simula- 
tion tool has been implemented. Based on the simulator NS-2 [1], it uses exten- 
sions such as the MPLS module MNS [13], the rtProtoLS module [14] and other 
protocol implementations, e.g. RSVP-TE Hellos. The OSPF implementation de- 
rives from rtProtoLS, to which a Hello protocol and timers have been added [15]. 
And the OSPF extensions were built from this implementation by changing the 
way the routes are calculated and the reactions to a failure are handled. The 
simulation scenario is specified in topology and traffic demand files, in NDL for- 
mat (Network Description Language), an extension of GML [16]. NAM [1] is 
also used for the visualisation of the network activity. All tools are integrated 
into a comprehensive simulation framework, easily customizable through a sim- 
ple GUI. This simulator automates the creation of OSPF or MPLS simulations 
for NS-2. Figure 2 shows how the different tools are articulated within the sim- 
ulation framework. Given a topology, the MPLS Paths computation module© 
builds MPLS working and backup paths, using Dijkstra’s algorithm, and exports 
them in NDL format. Supported recovery schemes are Link Protection, similar 
to Cisco’s Fast Reroute, and the method of Haskin [2]. After giving some param- 
eters, such as triggering link failures, a tool translates all NDL sources into one 
NS-2 simulation file©. For the OSPF simulations, the NS-2 simulator has been 
extended to allow local external routing algorithms©. This allows to use existing 
routing tools and to develop routing independently from NS-2. The results are 
visualized in NAM (3). 
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4 Measurements and Results 

The focus of the investigations was on the speed of the traffic restoration after 
a failure. As a main sample network, the Pan-European optical network from 
the COST 239 project [17] was chosen because of its widespread use for network 
investigations. This network, shown in Fig. 3 contains 11 nodes and 26 links with 
capacities of 20 Gbit/s. A full- mesh of equal flows between all nodes has been 
used as demand pattern. To save simulation time, the link band widths are scaled 
down by a factor of 1000. The sources send packet flows with 800kbit/s constant 
bit rate (CBR) (packets of 500 bytes sent every 5 ms) . This allows more than 20 
simultaneous flows on one link without any packet loss. The simulation starts 
with the establishment of the network configuration. For MPLS this includes 
the set up of paths and backup paths. For OSPF this means the convergence of 
the OSPF routing protocol. After starting the sources, a link failure is simulated 
triggering failure detection, dynamic route calculation, if necessary, and switch- 
ing to alternative routes. To get rid of synchronisation effects of hello timers with 
failure times, the simulations are repeated with different periods of time between 
the simulation start and the failure time. The simulation is also repeated for all 
possible link failures, to average over the effect of different failure locations. To 
characterise the effect of the failure, the sum of the rates of all traffic received 
at sinks in the network is considered over the time. Fig. 4 shows the affected 
traffic and the times for restoration for different MPLS protection switching and 
IP rerouting approaches both with different timer values for the RSVP refresh 
messages or for the OSPF hello protocol. Each curve in Fig. 4 shows the sum 
of all traffic flows in the network. After the occurrence of a failure the sum rate 
decreases since the traffic that is expected to be carried over the failed link is 
lost. Just after the link is repaired, shortest routes are used again while packets 
are still on the alternate routes, which results in more packets reaching their 
destination during a few milliseconds. The four curves represent the cases: 

— MPLS Link Protection 5 with RSVP-TE standard failure detection intervals 
of 5ms (a) and 100ms ©. 

— OSPFfi°ii 0 with modified hello intervals of 100ms @ and OSPF^ d with 
hardware failure detection of 5ms (6). 

It can be noticed that standard MPLS protection switching, (a), is much faster 
than both OSPF mechanisms. Even MPLS ©, with the same Tneiio and Ti> ea d 
timers as OSPF^i lo is still faster, in the order of 100ms. This results from the 
computational effort, the signalling delay and mostly from the update of the 
FIBs, which is more time consuming for the larger tables of OSPF - compared 
to MPLS. Of course, this is a very implementation dependent parameter and 
may be addressed in future router developments. The effect of hardware failure 
detection is shown in Fig. 5. Obviously the hardware failure detection 6 speeds 
up the OSPF recovery considerably. This figure also shows a difference between 

5 the Haskin cases give similar results regarding reconfiguration time 

6 this timer is set to 5ms, which is realistic regarding current physical possibilities 
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Fig. 4. Comparison between MPLS and accelerated OSPF recovery 




shortest path routing ©(g), and multi-path routing ©(h), as it is described in [10]. 
With multi-path routing the traffic is distributed over a fan of paths, including 
paths longer than the shortest paths. Therefore the probability for such a path 
to be hit by a single link failure is higher. This results in the increased impact 
represented by the lower throughput in the case of a failure. Fig. 6 depicts 
the different times involved in the extended OSPF implementation, with the 
values used for the simulations. The predominant times here are the detection of 
failures and the updating of the forwarding tables. For larger networks, the LSA 
processing times also have to be considered. This indicates clearly where future 
improvements in OSPF and router technology are necessary: failure detection 
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Fig. 6. Relative size of the various times involved in OSPF implementation. 



and FIB update. To reduce the failure detection time, hardware failure detection 
already gives major relief. Moreover, where hardware failure detection does not 
help, short hello intervals will also allow faster failure detection. In [18] a protocol 
is proposed allowing the use of short hello intervals independent of the routing 
protocol. The other major time that has to be improved is the FIB update time. 
As already mentioned above, this requires changes in the router implementation. 

5 Conclusion 

The current Internet routing protocol OSPF as it is implemented and used to- 
day has major deficiencies with respect to network resilience. The simulative 
comparison with MPLS-enlranced networks shows the superior time behavior of 
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MPLS resilience. We have outlined that there are several proposed extensions to 
improve the resilience of routed networks. These proposals include optimization 
of timers and the use of multi-path routing with local failure reaction. At inves- 
tigating the extensions by simulation it turned out that they are the first steps 
in the right direction. From the investigations it can be concluded that there are 
two major points to be addressed in order to improve the restoration speed of 
OSPF re-routing: speed-up of failure detection and acceleration of forwarding 
information base (FIB) update. For the former some very promising approaches, 
like hardware failure detection and fast hello protocols (e.g. BFD [18]) are already 
evolving. For the acceleration of the FIB updates the internal router architec- 
tures have to be improved. With these extensions OPSF routed networks will be 
able to reach sub-second restoration speeds in the future. 
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Abstract. The next-generation wireless networks are evolving toward 
an IP-based network that can provide various multimedia services seam- 
lessly. To establish such a wireless mobile Internet, the registration do- 
main supporting fast liandoff is integrated with the DiffServ mechanism. 
In this paper, a Distributed-request-based CDMA DiffServ call admis- 
sion control (CAC) scheme is proposed for the evolving mobile Internet. 
Numerical examples show that the forced termination ratio of handoff 
calls is guaranteed to be much less than the blocking ratio of new calls 
for a seamless fast-handoff while proposed scheme provides quality of 
service guarantee for each service class efficiently. 



1 Introduction 

Provision of various realtime multimedia services to mobile users is the main 
objective of the next-generation wireless networks, which will be IP-based and 
should internetwork with the Internet backbone seamlessly [1]. The establish- 
ment of such wireless mobile Internet is technically very challenging. Two major 
tasks are the support of seamless fast-handoff and the provision of quality of 
service (QoS) guarantee over IP-based wireless access networks. For realtime 
traffic, the handoff call processing should be fast enough to avoid high loss of 
delay-sensitive packets. Note that handoff call dropping never occurs in seamless 
wired networks. The forced termination probability of handoff calls in progress 
must be at least less than the new call blocking probability for GoS (Grade of 
Service) guarantees to users and a seamless networking with wired networks [2] . 

To achieve fast handoff requires both a fast location/mobility update scheme 
and a fast call admission control (CAC) scheme. The popular scheme for fast 
location update is a registration-domain-based architecture. The radio cells in 
a geographic area are organized into a registration domain (e.g., a foreign net- 
work in the TeleMIP architecture [3]), and the domain connects to the Internet 
through a foreign mobile agent (FA) [3], [4], [5]. When a mobile host (MH) 
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moves into a registration domain for the first time, it will register the new care- 
of-address (the address of the FA) to its home agent. While it migrates within the 
domain, the mobility update message will only be sent to the FA, without reg- 
istration with the home agent which is often located far away. This registration 
process significantly reduces mobility management signaling. 

The differentiated services (DiffServ) architecture has been proposed as a 
scalable way to provide quality of service (QoS) in IP networks [6]. In this pa- 
per, to provide the DifFServ-based QoS over wireless networks, the registration 
domain is modeled as a DiffServ administrative domain, with the FA router as 
the edge router connecting to the Internet backbone and the base stations as 
the edge routers providing Internet access to mobile hosts (MHs). A bandwidth 
broker will manage the resource allocation over the DiffServ registration domain. 
We consider wireless links as bottleneck links in the domain, and the service level 
agreement (SLA) is negotiated mainly based on the wireless resource availability. 
In the DiffServ architecture, there are three differentiated services of premium 
service, assured service, and best-effort service. The premium service is ideal for 
real time services such as IP telephony, video conference and the like [7]. The 
assured service was proposed to ensure an expected capacity and a low delay for 
avoidance of an excessive delay of non-realtime applications [8]. In our scheme, 
voice and videophone services are considered as the premium services. 

In this paper, we propose a distributed-request-based CDMA DiffServ call 
admission control (CAC) scheme over such a DiffServ registration domain, to 
achieve a seamless fast-handoff, QoS guarantee for each service class, and high 
utilization of the scarce wireless frequency spectrum. The time frame consists 
of an access slot and a transmission slot for a distributed-request-based code 
assignment. An access permission probability is adaptively given to each service 
user and we give higher access permission probability to the handoff calls than 
the new calls. For premium data services of voice and videophone services, the 
code reservation is allowed to guarantee throughputs and the forced termination 
probability of handoff calls is guaranteed to be less than the new call blocking 
probability. For the assured date service, proposed scheme can make the aver- 
age transmission delay be much lower than that of the best-effort data service 
through provision of reserved codes. Thus, a higher data throughput than the 
best-effort data service can be guaranteed to the assured data service. 

Numerical examples using an EPA (Equilibrium Point Analysis) method [9] 
show that the proposed CAC scheme can determine capacities for the differen- 
tiated services in a cell satisfying the requirements in next-generation wireless 
networks (e.g., a minimized forced termination ratio for a seamless fast-handoff, 
avoidance of an excessive delay of non-realtime applications in providing multi- 
media services), which capacities are required to determine the resource require- 
ment in the SLA negotiation between a DiffServ registration domain and the 
Internet service provider. 
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Fig. 1. A DiffServ registration-domain-based wireless network architecture 



2 Proposed DiffServ CAC Scheme 

As illustrated in Fig. 1, the system under consideration is a DiffServ registration- 
domain-based wireless networks architecture where the TeleMIP is used to man- 
age mobility and support fast lrandoff. The FA router is the interface connecting 
to the DiffServ Internet backbone, where an SLA is negotiated to specify the 
resource allocated by the Internet service provider to serve the aggregate traffic 
flowing from/into the FA router. The FA router conditions the aggregate traffic 
for each service class according to the SLA resource commitments. The base 
stations provide MHs with access points to the Internet, and perform per-flow 
traffic conditioning and marking when data flow in the uplink direction. We 
focus on a CAC scheme for seamless QoS provisions in the following. 



2.1 System Description 

We have considered a multi-rate transmission CDMA system that is required to 
support the multi-class services such as voice and videophone premium services, 
assured and best-effort data services. In this paper, an uplink CAC scheme is 
proposed for the system considered. The time scale is organized in frames con- 
taining the access slot and the transmission slot as shown in Fig. 2. When each 
terminal has new data ready to transmit, it first sends an access packet spread 
by a randomly chosen access code through the access slot according to a given 
access permission probability, in order to reserve a code for data transmission. 
If any other contending terminal does not send an access packet with the same 
access code, no collision with its access packet occurs, and the base station can 
identify the terminal that sent the access packet. The base station then returns 
an acknowledgment with the assignment information of an available code for 
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data transmission, over the downlink. The terminals succeeding in code reser- 
vation send traffic packets spread by assigned transmission codes through the 
transmission slot [10]. Therefore, the terminals can send data without collision 
after code reservation. 



Frame k-1 - 



Frame k — *- 



Frame k+1 — » 



Packet 

transmission slot *\ 
Access slot 

/ (transmission-request slot) 



- Traffic packet 



\ V 



"" Piggybacking request 
( contention- free ) 



Fig. 2. Uplink frame structure 



If packet collision occurs in the access slot and the base station sends no 
response, the terminal will retry for code reservation in the next frame. If the 
access packet of a terminal is successfully received by the base station but there 
is no assignment of a transmission code to the terminal due to lack of trans- 
mission codes, the terminal also retries for code reservation in the next frame. 
This is because the base station does not have the request table for recoding 
such terminals in the proposed protocol. By using this distributed-request-based 
protocol, the mobile host can perform a request of call set-up in both uplink and 
down link data transmissions, which makes proposed scheme perform a faster 
admission control than a scheme where a sender-initiated-request of call set-up 
is done even in wireless networks after a large location/mobility-update-delay of 
Mobile IP protocol passes [3], [5]. 

2.2 Distribution of Codes and Access Permission Probabilities 

To reduce the packet collisions in the access slot, the transmission code reserva- 
tion through the piggybacking request and the access permission probability are 
presented. All the voice and the videophone premium service users except the 
data service users can demand further reservation of the acquired transmission 
code for the next frame during their call duration, while sending the traffic pack- 
ets having the piggybacking request bit indicating an additional transmission 
request or no more transmission. According to the piggybacking request infor- 
mations from the users, the base station keeps the reservation of the assigned 
transmission codes for the next frame during their call duration, to guarantee 
throughputs for premium service users. 

As shown in Fig. 3, K a spreading codes are used for sending the access pack- 
ets, and a separate set of K t spreading codes are used for sending the traffic 
packets. However, only the K a — K v access codes are actually used among all 
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the service users. The remaining K v access codes are fixedly assigned to those 
voice users that succeed in the reservation of a transmission code using one of 
the K a — K v access codes. On the other hand, K v codes among the K t transmis- 
sion codes are used for voice users, and then K vi and K as transmission codes are 
used for the videophone and the high-rate assured data service users respectively. 
Note that there are no reserved transmission codes for the low-rate best-effort 
service users. Instead, they use the transmission codes that remain after assign- 
ing the codes among K v transmission codes to voice users. Also, they can use 
the transmission codes that are already assigned to voice users but are not used 
during silent periods of voice calls, to utilize transmission codes efficiently. For 
the voice users, the assigned transmission codes are released during the silent 
periods of voice calls through the piggybacking request, because traffic packets 
are not transmitted during that time. However, when the voice user enters the 
talk-spurt state from the silent state, the user transmits the access packet de- 
noted as A in Fig. 3 to demand re-reservation of the acquired transmission code. 
Since the access packet A is spread by the fixedly assigned access code (i.e., one 
of the K v access codes) during its entire call duration, there are no voice packets 
dropped. When the voice call is over, the voice user transmits the access packet 
denoted as T to indicate call termination with the fixedly assigned access code. 
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jth voice, videophone premium service user : v(j), vi(j) 

d as : high-rate assured data service user, d be : low-rate best-effort data service user 
A : the access packet to demand re-reservation of the acquired transmission code 
T : the access packet to indicate call termination 



Fig. 3. Assignment scheme of transmission codes 



Next we will describe the access permission probability. Based on the number 
of available transmission codes for each service and the estimated number of 
contending users for each service, it is calculated every frame by the base station, 
and then it is broadcast to all users of each service in a cell over the downlink. As 
shown in Fig. 3, R v ^ and denote the numbers of reserved transmission codes 
for the voice and the videophone connected users at the i th frame, respectively. 
The Rt,i denotes the number of the voice users in talk-spurt state at the i th 
frame. Then, from the Fig. 3, the numbers of the available transmission codes 
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for the voice, the videophone, the higlr-rate assured data, and the low-rate best- 
effort data services at the i tlr frame are determined as K v — R v ^ , K V i — R v ij , 
K as + K v i — R v i t i , and K v — R, t)l , respectively. Since the available transmission 
codes are assigned firstly to the voice users, the low-rate best-effort data service 
users whose access packets are received successfully use the remaining K v — R, Vj i 
codes not assigned to the voice users and the R v j, — R t ,i codes in the silent 
periods of voice calls. On the other hand, we denote the estimated numbers of 
contending users of the new call and the lrandoff call for the voice service as C Vn 
, C Vh , those of the new call and the lrandoff call for the videophone service as 
C V i n , C V ih , and those for the higlr-rate assured and the low-rate best-effort data 
services as C as , Cbe , respectively. Then, the access permission probabilities of 
the new calls and the lrandoff calls for the voice and the videophone services P Vn 
, P Vh , P v i n , P V i h , , and those for the assured and the best-effort data services 
Pas i Pbe are given as in Eq. (1). The priority is given to the lrandoff calls by 
permitting P Vh , P vih as 1. Also, we set the access permission probabilities for 
the forcedly terminated lrandoff calls as 1. 



P v „ = min{ 1, Kv Rv ’* }, P vin = min{ 1, Kvl ^ >UM }, P Vh = P V i h = 1 



P as = min{ 1, 



C v 

Kas + Kyi — R Vl.l 



C a 



C 1 

ID • ri (Kv ~ Rv,i ) + (Rv,i ~ Rt,i ) 1 

|, Pbe = min{ 1, — } 



Cbe 



(1) 



3 Numerical Examples and Discussions 

In this section, we discuss the steady state performances of the proposed CAC 
scheme through numerical examples obtained using the EPA method. We chose 
the multi-processing gain CDMA technique for the multi-rate transmission 
CDMA system where using random code sequences and the BPSK modulation 
scheme in the B bandwidth [11]. The performance of the proposed CAC scheme 
is influenced by the parameters of K v , K v %, K as , and K a whenever using any 
kind of codes. We assume that all data are transmitted with the same bit energy 
to noise ratio 20dB E^/Nq and there is no ICI (Inter Cell Interference). Then, 
by using the multi-processing gain CDMA technique, the CDMA system can 
provide voice and low-rate best-effort data services of 24kbps bit rate with ICC 4 
BER, and the videophone and the high-rate assured data services of 72kbps bit 
rate with 10 -5 BER, by setting K v , K vi , and K as as 8, 5, and 3, respectively. 
Also, K a is set as 43 to guarantee a 3 x 10 -4 BER for the access packet of 24kbps 
data rate [11]. 

We assume that the call processing and the transmission code assignment 
are completed simultaneously and the data service users have infinite buffer 
size. The wireless bandwidth B is set 4.096MHz as in UTRA W-CDMA, and 
the frame rate of videophone service using the H.263 coding technique is set 50 
frames/sec from the frame length of 20msec [12]. The forced termination ratio of 
lrandoff calls is set equal to the blocking ratio for lrandoff calls because the delay 
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constraint on the lrandoff call is set zero frame. In the voice service subsystem 
model, the mean talk-spurt duration and the mean silent duration are set lsec 
and 1.35sec. The voice activity ratio is set 0.43, and the mean call holding times 
of the voice and the videophone services are set 3min. Also, the activity ratio of 
data traffic is set 0.05 by assuming that the mean active time and the mean idle 
time are 19/57 sec and 360/1080 sec for the low-rate best-effort/the higlr-rate 
assured data services [9] . For voice and videophone service users, A t , n , X Vh , A 
and \ v i h denote arrival rates of new calls and lrandoff calls for each service. X Vn 
is set as 152 calls/hour and X vin is set as 95 calls /hour. Also, We set X Vh /X Vn 
and X v i h /X v i n ratios as 0.5 which reflect the mobility of the users. New call 
generation rates for the services are set as 0.2 times of call termination rates. 

Figure 4 and 5 respectively show how the blocking ratios for voice and video- 
phone services vary under given X Vh / X Vn and X v i h /X v i n ratios when changing 
the total number of users for each service in a cell (i.e., M v and M V i ). We can 
see that the blocking ratio for each service becomes larger as the total number 
of users increases. It is because the number of contending users for each service 
is proportional to the total number of its users in a cell. On the other hand, we 
can see that the blocking ratio of the new call increases as the arrival rate of the 
lrandoff call with higher access permission probability increases. Also, the forced 
termination ratio of the lrandoff call increases due to the increase in the number 
of contending users as the arrival rate ratio of the lrandoff call to the new call 
increases. In both figures, we can observe that there are turning points where 
the blocking ratios increase radically. If M v or M v i is larger than the turning 
point, the blocking ratio becomes very large. On the whole, the forced termina- 
tion ratio of the lrandoff call is smaller than the blocking ratio of the new call, 
and the difference between the ratios becomes more significant over the turning 
point. It is because we give priority to the lrandoff call over the new call through 
the access permission probability and the access permission probability of the 
new call becomes much smaller than that of the lrandoff call as the number of 
contending users increases (See Eq. (1)). If the delay constraint on the lrandoff 
call is set as a larger non-zero frames, this effect for GoS guarantees to users 
and a seamless fast-handoff also becomes larger while guaranteeing 72kbps and 
24kbps throughputs for the premium services, respectively. 

In the proposed CAC scheme, the low-rate best-effort data users who have 
accessed successfully can use the remaining transmission codes of K v — R V)l not 
assigned to the voice users, and the lriglr-rate assured data users can use the re- 
maining codes of K vl — R vi i not assigned to the videophone users in i th frame. 
Therefore, we investigate the impact of the numbers of contending users for the 
voice and the videophone services upon the average delay performances of both 
data services, respectively. We have considered an average packet transmission 
delay to achieve a reference 24 kbps throughput, to compare the performance. 
Figure 6 shows how the average delay for the low-rate best-effort data service 
varies under given numbers of contending users for the voice service when chang- 
ing the total number of users for the low-rate best-effort data service in a cell 
Mbe . We can see that the average delay increases rapidly as the number of 
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New call blocking and forced termination ratios for voice service 




Fig. 5. New call blocking and forced termination ratios for videophone service 
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Fig. 6. Average delay comparison for the data services 







Fig. 7. Average throughput comparison for the data services 



contending users for the voice service or the Mbe becomes larger. We can obtain 
similar results for the high-rate assured data service. However, for assured data 
service, the average packet transmission delay is much lower than that of best- 
effort service through the provision of Kas codes resource reservation. Thus, a 
higher data throughput than the best-effort data service is guaranteed to the as- 
sured data service as shown in Fig. 7. However, three assured data service users 
can have the 72 kbps data throughput if the number of contending users for the 
videophone service is not larger than 21.5. This is because more contending users 
can cause more packet collisions in the access slot, so that the data throughput 
is decreased. 
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4 Conclusion 

In this paper, for seamless fast-lrandoff in wireless mobile Internet, we have 
proposed a distributive-request-based CDMA DiffServ CAC scheme. Through 
the presented code assignment and access permission probability schemes, we 
have shown that it can guarantee a constant GOS to handoff calls. Thus, their 
forced termination probability is guaranteed to be much less than the new call 
blocking probability for a seamless fast-handoff. Furthermore, proposed CAC 
scheme can provide QoS guarantee for each service class efficiently in multi-rate 
transmission cellular CDMA systems. 
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Abstract. An intention to adopt IP protocol for future mobile commu- 
nication and subsequent extension of Internet services to the air interface 
calls for advanced performance modeling approaches. To provide a tool 
for accurate performance evaluation of IP-based applications running 
over the wireless channels we propose a novel cross-layer wireless chan- 
nel modeling approach. We extend the small-scale propagation model 
representing the received signal strength to IP layer using the cross-layer 
mappings. Proposed model is represented by the IP packet error process 
and retains memory properties of initial signal strength process. Con- 
trarily to those approaches developed to date, our model requires less 
restrictive assumptions regarding behavior of the small-scale propaga- 
tion model at layers above physical. We compare results obtained using 
our model with those, published to date, and show that our approach 
allows to get more accurate estimators of IP packet error probabilities. 



1 Introduction 

While next generation (NG) mobile systems are not completely defined, there 
is a common agreement that they will rely on IP protocol as a consistent end- 
to-end transport technology. The motivation is to introduce a unified service 
platform for future ’mobile Internet’ known as ’NG All-IP’ mobile systems. 

To date only a few studies devoted to IP layer performance evaluation at the 
air interface have been published. Survey of literature has shown that most stud- 
ies were devoted to analysis of the data-link layer protocols [1,2]. Additionally, 
approaches developed to date, adopt quite restrictive assumptions regarding the 
performance of wireless channels at layers above physical. As a result, they may 
lead to incorrect estimation of IP layer performance parameters. 

In this paper we propose a novel cross-layer wireless channels modeling ap- 
proach. We extend the small-scale propagation model representing the received 
signal strength to IP layer using the cross-layer mappings. The proposed model 
is represented by the IP packet error process, retains memory properties of initial 
signal strength process and captures specific peculiarities of protocols at layers 
below IP. We show that our approach allows to get more accurate estimators of 
IP packet error probabilities compared to those approaches used to date. 
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Our paper is organized as follows. In Section 2 we overview propagation 
characteristics of wireless channels and models used to capture them. In section 
3 we propose our extension to IP layer and define model that provides IP packet 
error probabilities. In Section 4 we provide numerical comparison of our approach 
with that one widely used in literature. Conclusions are drawn in the last section. 



2 Propagation Characteristics of Wireless Channels 

The propagation path between the transmitter and a receiver may vary from 
simple line-of-sight (LOS) to very complex ones due to diffraction, reflection and 
scattering. To represent performance of wireless channels propagation models are 
used. We distinguish between large-scale and small-scale propagation models [3]. 

When a mobile user moves away from the transmitter over large distances the 
local average received signal strength gradually decreases. This signal strength 
is predicted using the large-scale propagation models. However, such models [4, 
5,6] do not take into account rapid variations of the received signal strength. 
As a result, they cannot be effectively used in performance evaluation studies. 
Indeed, when a mobile user moves over short distances the instantaneous signal 
strength varies rapidly. The reason is that the received signal is a sum of many 
components coming from different directions. Propagation models characterizing 
rapid fluctuations of the received signal strength over short time duration are 
called small-scale propagation models. In the presence of dominant non-fading 
component the small-scale propagation distribution is Rician. As the dominant 
component fades away Rician distribution degenerates to Rayleigh one. 

The small-scale propagation models capture characteristics of wireless chan- 
nel on a finer granularity than large-scale ones. Additionally, these models im- 
plicitly take into account movements of users over short travel distances [7,8]. In 
what follows, we restrict our attention to the small-scale propagation models. 



2.1 Model of Small-Scale Propagation Characteristics 

Assume a discrete-time environment, i.e. time axis is slotted, the slot duration 

is constant and given by At = (U+i — U), i = 0, 1, We choose At such that 

it equals to the time to transmit a single symbol at the wireless channel. Hence, 
the choice of At depends on properties of the physical layer. 

Small-scale propagation characteristics are often represented by the stochas- 
tic process {L(n),n = 0, 1, ... } modulated by the discrete-time Markov chain 
{SL(n),n = 0,1,...}, S'i(n) £ (1, 2, . . . , M} each state of which is associated 
with conditional probability distribution function of the received signal strength 
[9,10]. The underlying modulation allows to take into account autocorrelation 
properties of the signal strength process. Since it is allowed for the Markov pro- 
cess = 0,1,...} to change state in every time slot, every bit may 

experience different received signal strengths. 

An illustration of such a model is shown in the Fig. 1 where states are as- 
sociated with conditional distribution functions Fi(fcZ\/|i)(Z\/) = Pr{L(n ) = 
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F L (kAf 1 1)(Af), k = 1,2, . . . , N F l (kAf \ 2){Af),k = 1,2 N 




Fig. 1. An illustration of the Markov model for small-scale propagation characteristics. 

= i}, k = 1,2,..., iV, i = 1,2,..., M, where N is the number of bins 
to which the signal strength is partitioned and Af is the discretization interval. 
Let Dl and tvl = (7Ti, 7r2, . . . , ttm) be the one-step transition probability ma- 
trix and the stationary probability vector of {5x(n), n = 0,1,...} respectively. 
Parameters M, Dl, Fi,(kAf\i)(Af ), must be estimated from statistical data [9, 
10]. For the ease of notation we will use F L (k\i) instead of F L (kAf\i)(Af). 

3 Wireless Channel Model at IP Layer 

The small-scale propagation model of the received signal strength defined in 
the previous section cannot be directly used in performance evaluation studies 
and must be properly extended to IP layer at which QoS usually is defined. 
To do so we have to take into account specific peculiarities of layers below IP 
including modulation schemes at the physical layer, data-link error concealment 
techniques and possible segmentation procedures between different layers. As a 
result, the IP layer wireless channel model must be a complex cross-layer function 
of underlying layers and propagation characteristics. 

In the following subsections we define models of incorrect reception of the 
protocol data units (PDU) at different layers. For this purpose we implicitly 
assume that these PDUs are consecutively transmitted at corresponding layers. 

3.1 Bit Error Process 

Consider a certain state i of the Markov chain {5i(n), n = 0,1,...} associated 
with the conditional probability distribution function Fy(fc|i), k = 1,2,..., W, 
of the received signal strength. Since the probability of a single bit error is the 
deterministic function of the received signal strength [3], all values of Fy(fc|*) 
that are less or equal to a computed value of the so-called bit error threshold Bt 
cause bit error. Those values which are greater than Bt do not cause bit error. 
So that each state i,i = 1,2,..., M of the Markov process {Sz,(n), n = 0, 1, . . . } 
can be now associated with the following bit error probability PE,i'- 

Bt 

PE,i = Pr{E{n) = 1 1 Se ( n) = i} = ^ Pr{L{n) = k\S L (n) = i}, (1) 

k = 1 

where {E(n),n = 0, 1, . . . }, E(n) € (0, 1} is the bit error process for which 1 de- 
notes an incorrectly received bit, 0 denotes a correctly received bit, {£A(n),?r = 
0,1,...} is the underlying Markov chain of {E(n),n = 0,1,...}. Note that 



F L (kAf | M)(Af),k =1,2 N 

m Y 
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{SrXn), n = 0,1,...} and {S E {n),n = 0,1,...} are actually the same and 
it e = kl, De = Dl, where De and ir E are one-step transition probability 
matrix and stationary distribution vector of { S E (n ), n = 0, 1, . . . } respectively. 
Bt must be estimated based on a modulation scheme and other specific features 
of physical layer utilized at a given wireless channel [3] . 

Let us denote by d E ,ij(k ) = Pr{E(n) = k,SE(n ) = j\SE(n — 1) = i}, 
k = 0,1, the transition probability from the state i to state j with correct 
(fc = 0) and incorrect ( k = 1) bit reception respectively. These probabilities can 
be represented in a compact form using matrices D E { 1) and D E (0) such that 
De{ 1) + De{ 0) = De- In our case the state from which the transition occurs 
completely determines the bit error probability. The state to which transition 
occurs is used for convenience of matrix notation useful in the following. 

3.2 Frame Error Process Without FEC 

Assume that the length of the frame is constant and equals to m bits. The 
sequence of consecutively transmitted bits, denoted by gray rectangles, is shown 
in the Fig. 2, where (/ — 1), l , ( l + 1) denote time intervals whose length equals 
to the time to transmit a single frame; k, i , j, denote the state of the Markov 
chain {S's(n), n = 0, 1, . . . } in the beginning of these intervals. 

Consider the stochastic process {N(l),l = 0,1,...}, N(l) S {0,1,..., m}, 
describing the number of incorrectly received bits in consecutive bit patterns of 
the length m. This process is doubly stochastic one modulated by the underlying 
Markov chain {Sn(1),1 = 0, 1, . . . }. {N(l),l = 0, 1, ... } and can be completely 
defined via parameters of the bit error process. 

Let us denote the probability of going from the state i to the state j for the 
Markov chain {Sn( 1), l = 0, 1, ... } with exactly k , k = 0,1 , . . . , m incorrectly 
received bits in a bit pattern of the length m by dN,ij(k) = Pr{N(l) = k,SN{l) = 
j\Sj\[{l — 1) = i}. These transition probabilities can be found using D E {k), 
k = 0, 1 and tt E '- 

d N , ij {0)=Tr E D%(0)e, 

o 

d Ntij ( l)=ir E J2 D™- k -\0)D E (l)D k E (0)e , 

k—m—1 

d N ,ij(m) = TT E D E (l)e, (2) 

where e is the vector of ones of appropriate size. 
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Fig. 2. Sequence of consecutively transmitted bits at the wireless channel. 
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Let us now introduce the frame error process {F(l),l = 0,1,...}, F(l) € 
{0, 1}, where 0 indicates the correct reception of the frame, 1 denotes incor- 
rect frame reception. Process {F(l),l = 0,1,...} is modulated by the under- 
lying Markov chain {Sp(l),n = 0,1,...}. Note that {Sp(l),l = 0,1,...} and 
{Sn(1),1 = 0, 1, ... } are the same. Let us denote the probability of going from 
the state i to the state j for the Markov chain = 0, 1, . . . } with exactly 

k, k = 0,1 incorrectly received frames by dF,ij{k). Process describing the num- 
ber of bit errors in consecutive frames can be related to the frame error process 
{F(l), l = 0, 1, . . . } using the so-called frame error threshold F T : 

Ft — 1 m 

dF,ij( 0) = ^2 dN,ij(k), d,F,ij (1) = 'Y2 d N ,ij(k). (3) 

k=0 }s=Ft 

Expressions (3) are interpreted as follows: if the number of incorrectly re- 
ceived bits in the frame is greater or equal to a computed value of the frame 
error threshold (k > Ft) the frame is incorrectly received and F{1) = 1, other- 
wise (k < Ft) the frame is correctly received and F(l) = 0. 

Assume that FEC is not used at the data-link layer. It means that every time 
a frame contains at least one bit error, it is received incorrectly {Ft = 1). Thus, 
the probabilities (3) of the frame error process take the following form: 

m 

dF,ij{0) = dN,ij(0), dp ti j{ 1) = Yd NtiJ (k) = 1 — d Ft jj{ 0). (4) 

fc= l 

The slot durations of {N(l),l = 0,1,...} and {F(l),l = 0,1,...} are the 
same At' and related to the slot duration of the received signal strength process 
{L(n),n = 0, 1, . . . } as At' = nlAt , n = 0,1, 

3.3 Frame Error Process with FEC 

The frame error threshold Ft depends on FEC correction capabilities. Assume 
that the number of bit errors that can be corrected by a FEC code is l. Then, 
Ft = {l + 1) and the frame is incorrectly received when k > {l + 1). Otherwise, 
it is correctly received. Thus, the transition probabilities (3) of the frame error 
process take the following form: 

l m 

dF,ij{^) — ^ ' dN,ij (k) i dF,ij{ 1) — ^ ' d N>i j{k). (5) 

k — 0 k— l-\-l 



3.4 IP Packet Error Process 

Assume that IP packet is segmented into 2 frames of equal size at the data- 
link layer. 1 The sequence of consecutively transmitted frames, denoted by gray 

1 Assumption of the constant frame size does not restrict the generality of the results 
as long as only one traffic source is allowed to be active at any instant of time for 
which only data-link error concealment techniques are possible (e.g., IP telephony). 
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Fig. 3. Sequence of consecutively transmitted frames at the wireless channel. 



rectangles, is shown in the Fig. 3, where (h— 1), h , (h + 1) denote time intervals 
whose length equals to the time to transmit a single packet; k, i, j, denote the 
state of the Markov chain {Sp(n). n = 0, 1, . . . } in the beginning of intervals. 

Consider the stochastic process {M(h),h = 1, 2, . . . }, M(h ) € {0, 1, ... , z}, 
describing the number of incorrectly received frames in a consecutive frame 
patterns of the length z. This process is modulated by the Markov chain 
{SM(h),h = 0,1,...} and can be defined via parameters of the frame error 
process. 

Let us denote the probability of going from state i to state j for the Markov 
chain {£j^(/i), h = 0, 1, . . . } with exactly k, k = 0, 1, . . . , z incorrectly received 
frames in a frame pattern of length 2 by d,M,ij(k ) = Pr{M(h) = /c, 5 m(^) = 
j | S M ( h — 1) = i}. These transition probabilities can be found using Dp(k), 
k = 0, 1 and TVp of {F(l),l = 0, 1, . . . } as follows: 

0) = TT F D z F (0)e, 
o 

d M ,ij( l) = ir F ^ D^T k ~ 1 (0)Dp(l)Dp(0)e, 

k—z—1 

d,M,ij(z ) = TTpDp(l)e, (6) 

where ivp is the stationary distribution vector of {Sf(1), Z = 0,1,...}. 

Let us now introduce the packet error process {P(h),h = 0, 1, . . . }, P(h) £ 
{0, 1}, where 0 indicates the correct reception of the packet, 1 denotes incorrect 
packet reception. Process {P{h),h = 0,1,...} is modulated by the underly- 
ing Markov chain {Sp(h),h = 0,1,...}. Note that {Sp(h),h = 0,1,...} and 
/i = 0, 1, ... } are the same. Let us denote the probability of going from 
the state i to the state j for the Markov chain {Sp(h) : h = 0, 1, . . . } with exactly 
k, k = 0, 1 incorrectly received packets by dp.ij(k). Process {M(h),h = 0, 1, . . . } 
describing the number of incorrectly received frames in consecutively transmit- 
ted packets can be related to the packet error process {P(h),h = 0, 1, . . . } using 
the so-called packet error threshold Pp- 

Pt~ 1 z 

^P,zj(0) — ^ ^ i dp,ij (l) — ^ ^ — 1 d M,ij • (7) 

k — 0 k—Px 

Expressions (7) are interpreted as follows: if the number of incorrectly re- 
ceived frames in a packet is greater or equal to a computed value of the packet 



200 D. Moltchanov, Y. Koucheryavy, and J. Harju 



error threshold ( k > Pt) the packet is incorrectly received and P(h) = 1. Other- 
wise, it is correctly received and P(h) = 0. Since no error correction procedures 
are defined for IP layer, Pt = 1 and only dM,ij( 0) must be computed in (6). 
That is, every time a packet contains at least one incorrectly received frame, the 
whole packet is received incorrectly. 

The slot durations of {P(h),h = 0, 1, . . . } and {M(h),h = 0, 1, . . . } are the 
same At" and related to the slot duration of the received signal strength process 
{L(n),n = 0,1,...} as At" = nlhAt, n = 0,1, 

3.5 Illustration of the Proposed Extension 

An illustration of proposed cross-layer mapping is shown in Fig. 4 where time 
diagrams of {L(n),n = 0,1,...}, {E(n),n = 0,1,...}, {N(l),l = 0,1,...}, 
{F(l),l = 0, 1, . . . }, {M(h), h = 0, 1, . . . }, {P(h),h = 0, 1, . . . } are shown. Error 
thresholds Bt, Ft and Pt must estimated as outlined previously and then used 
to compute transition probabilities of error processes at different layers. 

To define models of the incorrect reception of PDUs at different layers we 
implicitly assumed that appropriate PDUs are consecutively transmitted at cor- 
responding layers. Hence, the IP packet error process is conditioned on the event 
of consecutive transmission of packets. 




n 



Fig. 4. Illustration of proposed cross-layer mapping. 
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4 Comparison of the Proposed Approach 

Let us now compare the proposed approach with that one developed and used 
to date. We consider two cases: (1) the signal strength is assumed to be constant 
during the frame transmission time (2) the signal strength is mapped to the IP 
layer model in accordance with our approach. In what follows, we use subscripts 1 
and 2 to denote performance estimators obtained using corresponding approach. 

Assume that the wireless channel at the physical layer is represented by the 
Markov chain with M = 4, ps,i = 0, i = 1,2,3, Pea 7^ 0 and the following 
transition probability matrix: 

/ 0.42 0.18 0.24 0.16\ 

0.18 0.42 0.04 0.36 , 

~ 0.07 0.03 0.54 0.36 ' ^ ’ 

\0.03 0.07 0.09 0.81/ 

We also assume that exactly one IP packet is mapped into one frame at the 
data-link layer. It can be easily shown that this assumption provides the best 
possible conditions at the IP layer. 



4.1 Packet Error Processes Without FEC 

Assume that the length of the frame is m bits and FEC is not used at the data- 
link layer. Then, the conditional mean of the incorrectly received packets 2 is 
given by the following expressions: 

4 m— 1 

Ei[P\ = 5> E>1 J2 (! -PE,i) k PE,i, E 2 [P ] = 1 - 7T E D%(0)e. (9) 

i= 1 k—0 

The estimated values of the conditional mean of the incorrectly received IP 
packets for different values of m and pea are shown in the Table 1. Comparing 
obtained results we note that the assumption of the same received signal strength 
during the frame transmission time significantly overestimates the actual perfor- 
mance of wireless channels when the channel coherence time is comparable with 
the time to transmit a single symbol. 



4.2 Packet Error Processes with FEC 

Consider now the effect of FEC. Assume that the FEC code may correct up to l 
bit errors i.e. Ft = l + 1. Then, the conditional mean of the incorrectly received 

2 This performance parameter can be interpreted as the mean number of the incor- 
rectly received IP packets given that packets are generated according to Bernoulli 
process with probability of a single arrival set to 1. 
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Table 1. Conditional mean number of incorrectly received packets (no FEC) 
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1.000 
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0.622 
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Table 2. Conditional mean number of incorrectly received packets (FEC) 
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1.000 
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1.000 
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packets is given by the following expressions: 

4 m / l \ 

i=l k—F T ' ' 

4 m 

E 2 [P]=E EE,i ^ (10) 

i= 1 k=F-p 

Results for different values oil, m and pe,a are shown in the Table 2. For illus- 
trative purposes some non-realistic values of {in, l) are also included. Comparing 
obtained results we note that the assumption of the same received signal strength 
during the frame transmission time significantly underestimates or overestimates 
the actual performance of wireless channel depending on correction capabilities 
of FEC code. Our approach provides exact values of conditional mean of the in- 
correctly received packets when the channel coherence time is comparable with 
the time to transmit a single symbol at the wireless channel. 

5 Conclusions and Future Work 

We extended the small-scale propagation model representing the received signal 
strength of the wireless channel to IP layer using the cross-layer mappings. Our 
model is represented by the IP packet error process and retains memory proper- 
ties of the initial signal strength process. We compare results obtained using our 
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model with those presented in literature and show that our approach allows to 
get more accurate estimators of IP packet error probabilities when the channel 
coherence time is comparable with the time to transmit a single symbol. 

The proposed model entirely relies on classic small-scale propagation model 
and does not take into account the signal strength attenuation caused by move- 
ments of the user over the large distances. The aim of our further work is to 
extend our approach to mobility-dependent propagation characteristics. 
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Abstract. Multiprotocol Label Switching (MPLS) offers a simple and flexible 
transport solution for multiservice networks. Therefore, many UMTS operators 
are currently considering the use of a MPLS backbone. However, the efficient 
transport of short voice and data packets in the UMTS access network requires 
multiplexing and segmentation functions not provided by MPLS. This paper 
investigates the use of ATM Adaptation Layer 2 (AAL2) over MPLS to per- 
form both functions. The efficiency of AAL2/MPLS is analyzed for different 
traffic types and compared with other transport options. The results indicate 
that significant capacity savings can be obtained with this solution. 



1 Introduction 

The 3G Universal Mobile Telecommunications System (UMTS) is a network de- 
signed to support multiple applications (telephony, video-conferencing, audio and 
video streaming, games, Web access, e-mail, etc.) with very different traffic patterns 
and quality of service (QoS) requirements. The initial UMTS architecture defined by 
the Third Generation Partnership Project (3GPP) comprises a Wideband CDMA radio 
interface, an access network based on Asynchronous Transfer Mode (ATM), and a 
core network evolved from 2G networks. The core has two domains. The circuit- 
switched domain, founded on GSM, handles voice and other circuit-mode traffic. The 
packet-switched domain, derived from GPRS, handles IP traffic. This structure, de- 
scribed in the 3GPP Release 99 specifications, is progressively changing towards a 
unified model based on IP (Releases 4, 5, and 6). 

In contrast with the core network, the UMTS Terrestrial Radio Access Network 
(UTRAN) relies on an integrated transport infrastructure for all traffic types (voice, 
data, etc.). The UTRAN protocol architecture is divided into a Radio Network Layer 
(RNL), designed specifically for UMTS, and a Transport Network Layer (TNL) that 
reuses existing transport technologies. In Releases 99 and 4 the TNL consists of ATM 
connections with ATM Adaptation Layer type 2 (AAL2) or 5 (AAL5). See Table 1. 

Release 5 specifications [1], [2] allow two TNL alternatives: ATM, as outlined 
above, or IP with UDP in the user plane, and SCTP (Stream Control Transmission 
Protocol) for signaling. IP version 6 is mandatory and IP version 4 is optional, al- 
though a dual stack is recommended. The default encapsulation for IP packets is PPP 
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Table 1 . UTRAN interfaces and ATM Adaptation Layers used. The air interface (Uu) does not 
use ATM 

Interface User Plane Control Plane 

Iub: Node B - Radio Network Controller (RNC) AAL2 AAL5 

Iur: Serving RNC - Drift RNC AAL2 AAL5 

Iu-CS: S-RNC - Mobile Switching Center (MSC) AAL2 AAL5 

Iu-PS: S-RNC - Serving GPRS Support Node (SGSN) AAL5 AAL5 

(Point-to-Point Protocol) with HDLC framing, but other layer 2 protocols are also 

possible. 

A typical UTRAN topology will have many Node Bs, often connected by low- 
capacity links. Therefore, the efficiency of the protocol stack in the Iub interface 
between Node B and RNC becomes an important design issue. Two main problems 
must be solved. Firstly, UDP, IP, and layer 2 headers add a large overhead to voice 
packets, even if header compression procedures are used. This problem can be allevi- 
ated by multiplexing voice communications. In this way, several short voice packets 
are concatenated before transport, so the overhead of the headers added after the 
concatenation is shared by all the packets in the group. Secondly, long packets may 
introduce unacceptable delay variations when transmitted over a slow link. To avoid 
this, packets that exceed a certain maximum size are segmented before transmission. 
Note that segmentation reduces efficiency, but it may be necessary to meet QoS re- 
quirements (delay variation). 

The structure of the rest of this paper is as follows. Sect. 2 reviews various solu- 
tions proposed to implement concatenation and segmentation in the Iub protocol 
stack. Sect. 3 presents a new proposal that relies on AAL2 [3] for packet concatena- 
tion/segmentation, and MPLS [4] for transport. Sect. 4 evaluates the performance of 
AAL2/MPLS for voice and data, and compares it with other options. Finally, Sect. 5 
summarizes the contributions of the paper. 



2 Alternatives for Transport in the UTRAN 

ATM in the UTRAN is studied in [5] and [6], focusing on traffic modeling, QoS 
analysis, and simulation experiments with voice and data traffic. The 3GPP has inves- 
tigated IP transport in the UTRAN in an ad hoc group created within the Technical 
Specification Group Radio Access Network, Working Group 3 (TSG RAN WG3), 
which is responsible for the overall UTRAN design. A technical report [7] presents 
the conclusions of this work and makes some general recommendations for Release 5 
specifications. 

Different Iub stacks based on IP are simulated and compared in [8]. The concate- 
nation and segmentation functions mentioned in the previous section may be located 
above UDP/IP or below it. Protocols such as Composite IP (CIP) and Lightweight IP 
Encapsulation (LIPE) [9] concatenate and, optionally, segment packets above 
UDP/IP. Alternatively, both functions may be implemented at layer 2 by using 
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PPPmux [10] for concatenation, and the multi-class, multi-link extensions to PPP 
(MC/ML-PPP) [11] for segmentation. This is the option recommended by 3GPP in 
[7]. The application of the Differentiated Services model in the UTRAN is addressed 
in several studies. For example, in [12] voice calls are multiplexed with CIP and 
transferred with the Expedited Forwarding Per-Hop Behavior [13]. 

While these studies indicate that IP can be a viable alternative to ATM for the 
UTRAN, it is necessary to implement a complicated protocol stack to achieve the 
required performance. Firstly, UDP/IP headers have to be compressed to reduce the 
overhead added to short packets (e.g. voice). Many compression algorithms have 
been proposed, for example IP Pleader Compression, Compressed RTP, Enhanced 
Compressed RTP, Robust Pleader Compression, and others described in RFCs or 
Internet drafts. For references and a discussion of the advantages and drawbacks of 
each method see [7], PPP encapsulation and framing overhead must also be mini- 
mized by using the simplified header formats foreseen in the standard. 

Secondly, PPPmux and MC/ML-PPP must be implemented. When there are inter- 
mediate IP routers between Node B and RNC, PPPmux and MC/ML-PPP may be 
terminated at the edge router (ER) next to the Node B or at the RNC. In the former 
case, header compression, concatenation, and segmentation are applied only in the 
last hop between the Node B and the ER, where capacity is likely to be more limited. 
Packets are routed individually between the ER and the RNC. In the latter case, those 
functions are moved to the RNC, so the efficiency gains are maintained all the way up 
to the RNC. The disadvantage of this approach is that layer 2 frames must be tunneled 
between the Node B and the RNC. This implies that another IP header is added to the 
information transported across the network. See Fig. 1. 

The Radio Network Layer (RNL) includes other protocols not shown in Fig. 1, 
namely Medium Access Control, Radio Link Control, and Packet Data Convergence 
Protocol. Non-access stratum protocols, which are transparent for the UTRAN, are 
located on top of the RNL, for example IP packets exchanged between the user 
equipment and the Gateway GPRS Support Node (GGSN) in the UMTS core net- 
work, plus the higher layer protocols required by the user applications (TCP, HTTP, 




Node B Edge Router Radi ° Network 

Controller (RNC) 



Fig. 1 . UDP/IP transport stack in the user plane of the Iub interface, with end-to-end concate- 
nation/segmentation between Node B and RNC 
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etc.)- With the architecture of Fig. 1 these IP packets would be encapsulated with two 
additional IP headers, resulting in a poor efficiency. 

The use of MPLS in the UTRAN has been considered mainly in terms of protocol 
functionality, for example: mapping of traffic flows to MPLS Label Switched Paths 
(LSPs) and signaling for LSP establishment [14], combination of MPLS traffic engi- 
neering (MPLS-TE) functions with the Differentiated Services model (DiffServ) [15], 
and mobility support [16], [17]. These references point out potential advantages of 
MPLS for transport in the UTRAN, but no numerical results are presented. 

The MPLS/Frame Relay Alliance (MFA) defined an ad-hoc multiplexing protocol 
for efficient transport of voice calls over MPLS [18]. More recently, the MFA has 
proposed to use AAL protocols (without ATM cells) for voice trunking [19] and for 
TDM circuit emulation [20] over MPLS networks. 



3 AAL2/MPLS Transport in the UTRAN 

This section proposes a simpler TNL architecture with two main components: AAL2 
and MPLS. See Fig. 2. Specifically, we propose to use the Service Specific 
Segmentation and Reassembly (SSSAR) sublayer [21], which is more adequate for 
integrated voice and data transport, instead of the convergence sublayer for trunking 
[22] included in previous proposals that focused on voice traffic only. 




Node B Edge Router Radio Network 

Controller (RNC) 



Fig. 2. AAL2/MPLS transport protocol stack in the user plane of the Iub interface 

Our solution, explained in more detail below, combines standard protocols in such 
a way that each one performs the essential function it was designed for, and comple- 
ments the other. AAL2 is used to multiplex many variable-bit-rate, delay-sensitive 
traffic flows, and MPLS provides a flexible tunnelling mechanism without the over- 
head of ATM. The result is a simple protocol architecture that offers significant ad- 
vantages in comparison with other solutions (see Sect. 3.1.) 

The Common Part Sublayer (CPS) [3] of AAL2 concatenates voice and data pack- 
ets. Each CPS packet has a 3-byte header and a maximum payload length of 45 (de- 
fault) or 64 bytes. See Fig. 3. The SSSAR sublayer accepts data units up to 65568 
bytes and segments them up to the maximum length admitted by CPS. The last seg- 
ment of each packet is marked in the UUI bits of the CPS header, so the SSSAR adds 
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CPS Packet 
Header (1.363.2) 



CID 


LI 


UUI 


H 



CID Channel Identifier (8 bits) 

LI Length Indicator (6 bits) 

UUI User-to-User Indication (5 bits) 
H Header Error Control (5 bits) 



l 


1 




payload 




payload 




payload 






Layer 2 MPLS Label(s) 1.. 64 bytes Layer 2 

overhead (4 bytes each) overhead 



Fig. 3. Concatenated AAL2 CPS packets over MPLS 

no extra overhead. CPS and SSSAR are used in ATM-based UTRANs as defined in 
Release 99 and Release 4 specifications. CPS packets are mapped to cells sent over 
ATM virtual connections at the lub interface. The first byte of each cell payload, 
called Start Field, indicates the next CPS packet boundary. The time that a partially 
filled cell waits for the next CPS packet is limited by Timer_CU (Combined Use). If 
it expires, the cell is completed with padding bytes and transmitted. 

In the proposed AAL2/MPLS solution, AAL2 performs the same functions as in 
the AAL2/ATM case, except that CPS packets are not mapped to cells. They are 
concatenated up to Timer_CU expiration or up to a given maximum length Lmax 
(e.g. set to comply with a maximum transmission time in the Node B - Edge Router 
link), and transmitted over an LSP preceded by the MPLS label stack. CPS payloads 
contain Frame Protocol data units, segmented by SSSAR if necessary. 



3.1 Advantages of the Proposed Solution 

AAL2/MPLS is conceptually very similar to AAL2/ATM. ATM virtual connections 
and related traffic management procedures are replaced by MPLS LSPs, possibly 
supporting traffic engineering and class-of-service differentiation (DiffServ-Aware 
MPLS Traffic Engineering or DS-TE) [23]. AAL2/MPLS is considerably more effi- 
cient, because the ATM cell header overhead is eliminated. See the results of Sect. 4. 

The AAL2 protocol is implemented only at the end points (Node B and RNC), and 
intermediate ATM switches are not needed. Since AAL2 is still used, the interface 
offered by the Transport Network Layer to the Radio Network Layer is the same as in 
UMTS Releases 99 and 4. Moreover the standard signaling procedures used to estab- 
lish and release AAL2 channels (Access Link Control Application Part, ALCAP) can 
be reused. 

Compared with IP-based alternatives, AAL2/MPLS is simpler. IP tunnels, header 
compression, PPP multiplexing, and Multi-Class Multi-Link PPP are not required in 
the TNL (although IP is still used by applications in the non-access stratum, as men- 
tioned in Sect. 2). In the control plane, new signaling procedures (provisionally 
named “IP-ALCAP” by 3GPP) are not needed, because the standard ALCAP is used. 
Regarding efficiency, AAL2/MPLS compares to the best IP based solutions, with 
values in the 90-95% range. See Sect. 4. 
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4 Performance Evaluation 

The header formats and procedures of the different protocols mentioned in the pre- 
ceding sections have been analyzed in order to evaluate the efficiency of alternative 
transport stacks for the UTRAN. The analysis considers a simple case where every 
concatenated packet has the same length. The results presented in Sect. 4.1 and 4.2 
show the improvements that can be obtained with the AAL2/MPLS solution. 

A higher transport efficiency should allow the network to carry more traffic main- 
taining the QoS levels or, alternatively, to reduce the capacity needed to carry a given 
amount of traffic with the required QoS. To verify this assumption, we have simu- 
lated in more detail each protocol stack, transporting various mixes of voice and Web 
data traffic in the Iub interface between base stations (Node Bs) and controllers 
(RNCs). Optimizing this interface is particularly important for the operators due to 
the high number of Node Bs that must be connected in a typical UMTS network, so 
we focused the simulation study on it. However, the solution proposed here can be 
used in other UMTS interfaces as well. Sect. 4.3 gives a sample of the simulation 
results obtained. 



4.1 Analysis of Efficiency for Voice Traffic 

In this case, we analyze the transport of voice payloads of constant size P=40 bytes, 
corresponding to 32 bytes generated the Adaptive Multi Rate codec used in UMTS, 
plus 8 bytes added by the RNL [5]. The background noise description packets sent by 
the AMR codec during silence periods are not considered in the analysis. The number 
of concatenated packets per group (N) is equal to the number of voice connections in 
the active state. Segmentation is not necessary. 

The overhead per packet (H) and the overhead per group (Hg) take different values 
depending on the transport protocol stacks considered. For example, AAL2/MPLS 
over PPP with simplified HDLC framing (AAL2/MPLS/PPP/ HDLC) gives H=3 
bytes and Hg=9 bytes. UDP/IP with headers compressed to 4 bytes and concatenation 
at layer 2 (cUDPIP/PPPmux/HDLC) gives H=Hg=5 bytes. The details of the model 
and the complete set of parameter values used can be found in [24]. 

Fig. 4 shows the efficiency (number of RNL bytes transported divided by the total 
number of bytes transmitted by the TNL) as a function of the number of active voice 
connections. The best alternative is AAL2/MPLS, with efficiency above 90% even 
for moderate values of N. The efficiency of UDP/IP with headers compressed to 4 
bytes (cUDPIP) over PPPmux is not as good, because concatenation at layer 2 gives a 
higher overhead per packet (H). If the end-to-end configuration between Node B and 
RNC illustrated in Fig. 1 is used, the extra IP tunnel increases the overhead per group 
(Hg) and reduces the efficiency, especially when there are few packets per group. 

The curve labeled MPLSx2/PPP/HDLC corresponds to the case where AAL2 is 
not used and each packet is transmitted with a stack two MPLS labels: the inner label 
is used as a channel identifier, and the outer one serves to route the packets to their 
destination. The two labels (4 bytes each) plus the PPP/HDLC header add a total of 
13 bytes to each voice payload of 40 bytes. Therefore, the efficiency is 40/53=75.5%, 
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Fig. 4. IP and MPLS transport options vs. AAL2/ATM (voice traffic) 

which happens to be the same value obtained with AAL2/ATM when N voice pay- 
loads (N-40 bytes) are transmitted in N ATM cells (N-53 bytes). This is true for 
N<12. When N=12, 12 voice payloads are carried in 12 concatenated CPS packets 
(12-43=516 bytes), which still fit in 11 cells (11-47=517 bytes), so the efficiency of 
AAL2/ATM increases to 12-40/1 1-53=82.3% as illustrated in Fig. 4. 

Other transport options not shown in the graph give poor results. For example, the 
efficiency of a simple UDP/1P/PPP/HDLC stack without header compression and no 
multiplexing is only 55.6%. This value corresponds to IP version 4 headers (20 
bytes). With IP version 6 headers (40 bytes) it would be even lower: 43.5%. If IP is 
sent over AAL5/ATM instead of PPP/FIDLC, each voice packet is encapsulated in 2 
cells and the efficiency drops further to 37.7%. 



4.2 Analysis of Efficiency for Data Traffic 

Fig. 5 shows the transport efficiency for data payloads of variable size P up to 1500 
bytes. As explained in previous sections, short packets may be concatenated up to a 
maximum size Lmax, and packets longer than Lmax are segmented before transmis- 
sion. In Fig. 5 Lmax has been set to 1000 bytes. H is the overhead per packet or seg- 
ment and Hg is the overhead per group, with different values for each protocol stack 
as in the previous case. 

With compressed UDP/IP (cUDPIP) and MPLS using a 2-label stack (MPLSx2), 
segmentation is done at layer 2 by MC/ML-PPP. Both exhibit a high efficiency for 
long data packets, although they were not as good for short voice packets (compare 
the corresponding curves in Fig. 4 and Fig. 5). In MPLSx2, the internal label identi- 
fies each data channel, and the external one is used for routing. AAL2/MPLS reaches 
approximately the same value, close to 95%, for both voice and data. In this case, 
MC/ML-PPP is not needed because AAL2 takes care of segmentation at the SSSAR 
sublayer, so AAL2/MPLS uses the default PPP/HDLC encapsulation. 

Fig. 5 indicates that MPLSx2 outperforms AAL2/MPLS for packets longer than 
250 bytes approximately. Therefore, in scenarios where the traffic mix includes sig- 
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Fig. 5. IP and MPLS transport options vs. AAL2/ATM (data traffic) 

nificant amounts of short voice packets (a few tens of bytes) and long data packets (a 
few hundreds of bytes or more), the overall efficiency can be optimized by separating 
the traffic of each Node B in two Label Switched Paths: LSP 1 for voice with AAL2, 
and LSP 2 for data without it. In this case, and assuming that voice packets are always 
smaller than 65 bytes, the SSSAR part of AAL2 can be removed in LSP 1. Packets 
sent via LSP 2 are segmented, if necessary, by Multi-Class Multi-Link PPP. Note that 
the network operator may prefer to use separate LSPs for voice and for data anyway, 
even if AAL2 is used in both. 

4.3 Simulation Results 

The Iub transport protocol stacks considered in the preceding sections have been 
simulated with voice and Web traffic, in order to compare the delay and loss per- 
formance of the different options. The simulator was developed in C language, and 
we are currently porting it to OPNET Modeler [25]. 

The traffic source modules generate Frame Protocol data units (see Figs. 1 and 2). 
The data unit sizes and inter arrival times are set taking into account the relevant 
characteristics of the UMTS radio access bearers used, as well as the overheads added 
by the RNL protocols. The transmission rate and the Transmission Timing Interval 
(TTI) are the most decisive parameters. In our simulations, voice is coded at 12.2 
kbit/s with TTI=20 ms. Including the RNL overhead, this corresponds to one FP data 
unit of 40 bytes every 20 ms. During silence periods the data unit size is reduced to 
13 bytes. Web pages are downloaded at 64 kbit/s with TTI=40 ms, which corresponds 
to one data unit of 331 bytes every 40 ms, also including RNL overhead. For 
AAL2/MPLS, the simulator can be configured to use the same LSP for all traffic, or 
separate LSPs for voice and for data. In these experiments we chose the latter option. 

Fig. 6 is a sample of the results obtained. The curves show the Iub delay (0.95- or 
0.99-quantile) vs. the number of active users. The capacity available for user traffic is 
set to 1.92 Mbit/s, and delays are measured in the RNC-to-Node B direction (Web 
traffic is higher in this direction). As anticipated by the previous analysis of effi- 
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Fig. 6. Simulation of delay at the Iub Interface for different transport options 



ciency, AAL2/MPLS gives the lowest delay for voice. MPLS without AAL2 is a 
good option for data traffic (right) but not for voice (left), because it gives a much 
higher delay than the other options. For a given maximum delay in the lub interface, 
these curves may be used to estimate the number of users that can be served with the 
available capacity. 

A detailed description of the simulated scenarios and additional results, which can- 
not be presented here due to lack of space, can be found in [5,24]. 



5 Conclusions 

The UMTS transport layer is expected to migrate from ATM to packet- switched ar- 
chitectures that can provide the required quality of service at a lower cost. In this 
scenario, AAL2 (with SSSAR and CPS) over MPLS is a simple solution that offers a 
functionality similar to ATM, but in a more flexible and efficient way. This paper has 
discussed the main issues in transporting voice and data traffic from base stations 
connected with low capacity links, and has compared the performance of 
AAL2/MPLS with other proposals, both analytically and by simulation. Although the 
study focused on the access network, MPLS infrastructure may also be used in the 
UMTS core, and may even be shared by traffic from external IP networks. 

AAL2/MPLS is more efficient than AAL2/ATM (typical differences are between 
12% and 20%), so a larger fraction of the available capacity is dedicated to carry user 
traffic, and more customers can be served with the required QoS. Our simulator can 
be used to estimate the benefits of AAL2/MPLS in practical scenarios with different 
traffic loads and multiplexing strategies. AAL2/MPLS is particularly well suited for 
short packets, which are the most affected by protocol overhead. 
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Abstract. This paper investigates the case of a wireless sensor network 
deploying Bluetooth technology. The paper analyses main technological 
features of Bluetooth, such as error recovery, power consumption and 
connection establishment, and using some experimental results analyses their 
effect on the Quality of Service in the sensor network. The paper presents a 
case for a different view of QoS in sensor networks. We argue that the network 
availability, reliability and especially the originality (freshness) of data have 
emerged as crucial QoS issues in wireless sensor networking. The paper puts 
emphasis on master-side scheduling in Bluetooth and analyses using simulation 
the performance of three scheduling schemes in a sensor network model. We 
analyse the performance of the schemes in symmetric and asymmetric load 
environments and with different bit error probabilities. Two of the schemes are 
specially designed for sensor networks, and one of them, the Maximum Burst 
Delay First, shows very good results in the asymmetric load environment. 



1 Introduction 

Bluetooth [ 1 ] [2] is a wireless communication technology, specially designed to 
replace wires in short-range applications. Bluetooth operates at 2.4GHz, using 
Frequency Hopping spread spectrum baseband technology to minimise interference. 
The Bluetooth specification defines a radio frequency interface and the set of 
communication protocols for device discovery, data exchange and error correction. 
The link speed, communication range (less than 10m), and transmit power level for 
Bluetooth were chosen to support low-cost, power-efficient, single-chip 
implementations . 

Initially, Bluetooth was designed to replace wires in the PC systems, e.g. between 
mouse and PC or keyboard and PC. Recently, the concept of ubiquitous computing 
has emerged. Ubiquitous computing primarily refers to distributed systems of usually 
small computing devices able to communicate, exchange data and monitor each 
other’s activities. Examples of ubiquitous systems are sensor networks and smart 
home networks. Sensor networks are network of small devices which have limited 
functionality and only basic communication abilities. The transmitting power of 
sensors must be minimal, in order to simplify the maintenance of sensor networks and 
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minimise the cost. With respect to that, short-range low-consumption Bluetooth 
emerged as one of the most interesting technologies for communication in the sensor 
networks. 

In a Bluetooth piconet (smallest Bluetooth network) one device must always be the 
master and the other devices are slaves. Each device can operate as either master or 
slave. One master can communicate to maximum 7 slaves. If more than 7 slaves are in 
the piconet they must not be active, i.e. they must be in the low-power mode. 

In a typical Bluetooth application a master probes the environment searching for 
slaves. The discovery procedure is explained in more detail in section 3.2. Once the 
slaves have been found, the master polls them. The exact scheduling policy definition 
is omitted in the specification. Scheduling is one of the important research issues in 
Bluetooth. It is important to stress that the slaves can communicate only by replying 
to master’s polling packet. This is often used as one of the main drawbacks of 
Bluetooth - if two slaves want to communicate, they have to do it through the master. 
Other disadvantages include connection establishment and piconet/scatternet 
establishment delay, and connection availability due to interference. 

For an application to make some use of Bluetooth technology, it must be robust to 
delay, satisfied with the throughput provided, and find the range to be adequate. The 
use of Bluetooth in wireless sensor networks has already been investigated [3] [4] [5]. 
There is a general consensus that for Bluetooth to be used in sensor systems, a 
number of modifications need to be made. This paper contributes to the ongoing 
discussion about the use of Bluetooth in sensor systems by providing a parallel 
analysis of Bluetooth features that influence the Quality of Service in sensor systems. 
When it comes to sensor systems, the QoS analysis has to be done in a different way 
than with other communication systems. This paper presents a new approach to 
understanding QoS in wireless sensor systems, and proceeds to evaluate the 
performance of Bluetooth using both simulation and experimental testbed. We 
analyse three different scheduling schemes through simulation and derive conclusions 
on the optimal scheduling scheme for a sensor network. 



2 Quality of Service in Wireless Sensor Networks 

The traditional view of Quality of Service (QoS) in communication networks is 
concerned with end-to-end delay, packet loss, delay variation and throughput. 
Numerous architectures, scheduling techniques and admission control solutions have 
been presented in an effort to achieve guaranteed levels of network performance. 
Other performance-related features, such as network reliability, availability, 
communication security and robustness are often neglected in the QoS research. 
Matters are somewhat different in wireless personal and local area networks. 802. 1 1 
wireless LANs suffer greatly from problems in availability and reliability. The 
Quality of Service provided by these networks depends significantly on the level of 
interference in the environment, where the interference comes from other wireless and 
microwave devices, such as mobile phones, neon lights or microwave ovens. 
Situation with Bluetooth is slightly better, considering the short-range nature of 
Bluetooth communication. Still, we can say that the problems with overcoming 
interference and achieving satisfactory network availability present major challenge in 
design of local wireless network systems. With respect to local wireless networks, we 
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can define network availability not only as the ability of user hosts to connect to the 
network, but also to gain at least some minimal communication rate. Network 
reliability can be defined as the number of times the network “is down” due to 
significant interference or just bad signal reception. 



3 Bluetooth 

3.1 Interference, Error Correction, and Transport Control 

When it comes to network availability and reliability in wireless networks, they are 
primarily determined by the way the network reacts to packet losses. Bluetooth 
technology achieves reliability by retransmitting packets. Each Bluetooth packet 
carries a header with an acknowledgement bit in it. More specifically, in every packet 
header, there is an ARQN flag indicating the status of the previously received packet. 
ARQN=1 (ACK) means the packet has been received and correctly decoded. 
ARQN=0 (NAK) means the previous receive failed. A NAK occurs when the slave’s 
response to a master transmission has been lost, or when HEC (Header Error Check) 
fails or when the CRC fails - this is the usual reason for a retransmit condition during 
connection. When a device receives an indication that the last packet was corrupted, it 
simply retransmits the packet. The slave will respond in the slave-to-master slot 
directly following the master-to-slave slot; the master will respond at the next event it 
will address the same slave. This retransmission carries on until it receives an 
acknowledgement that the packet got through correctly. 

Bluetooth uses sequence numbers to inform the receiver whether the packet is 
being retransmitted or being sent for the first time. 

In the case of the packet being lost, the sender keeps the timer and when the 
timeout expires without the acknowledgement from the receiver, the packet is 
retransmitted. This slows down the communication substantially. 

To analyse the impact of the interference and bit errors on a end-to-end file transfer 
application, we have done a simple experiment. Our experimental testbed consisted of 
two notebook computers equipped with Bluetooth hardware and software. Fig. 1. 
shows the achieved goodput in transferring large files (380KB, 1.2MB and 3.6MB). 
We observe two environments - one in which there is no interference, and one in 
which an 802.11 WLAN Access Point is active in the close proximity of the 
notebooks. The Access Point is only transmitting control signals, with no data transfer 
happening in the WLAN, i.e. the level of interference is reasonably small. We can see 
how goodput decreases with the increase in the distance between the notebooks. Also, 
it is possible to quantify the impact of interference. We can see that, even with low- 
load interference, the impact on end-to-end file transfer time can be substantial. This 
result is interesting because raw application-layer file transfer time is observed. 

In wireless sensor network, for example, we have a case of a chunk of data being 
fragmented into packets and sent over Bluetooth link. In the case of high interference 
and high packet loss, the packets will be repeatedly retransmitted, which will 
substantially increase the overall file transfer. In the sensor network, there is an issue 
of the maximum file transfer time. If the data that is being transmitted is some 
measurement data, there is likely to be certain time delay after which the 
measurement data is considered useless. 
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Further to this, the Bluetooth specification defines something called the flush 
timeout. The flush timeout gives the amount of time in milliseconds that a device will 
spend trying to transmit a packet segment before it gives up. If a packet segment does 
not get through before the flush timeout is exceeded, the segment can’t get through 
and the whole packet is flushed. The flush timeout controls how many times the 
baseband can retransmit the packet 




0.5m 1m 2m 3m 4m 5m 

distance 

Fig. 1 . Throughput measurements 



3.2 Connection Establishment and Service Discovery 

The complicated and slow process of connection establishment is one of the main 
disadvantages of the Bluetooth system. There are two delays in setting up a Bluetooth 
link. First, it takes time to discover devices in the neighbourhood. A second delay 
occurs in setting up the connection itself. 

For two Bluetooth devices to start exchanging data, the nodes must discover each 
other first. The master sends small inquiry packets hopping frequencies at twice the 
rate in a normal connection. Slaves must be in an inquiiy scan mode, changing the 
frequency very slowly. When a slave receives two consecutive inquiry packets, it 
waits for random time and then it responses. Before the two devices can establish a 
connection, they must be in page and page scan modes. The paging device initiates a 
connection, while the page scanning device responds. 

The whole process is not short; Bluetooth specification defines an inquiry time of 
10.24 seconds. In reality it takes up around 2.5 seconds to establish a Bluetooth link 
[6]. This makes Bluetooth far from optimal for dynamic wireless systems that need 
fast response and fast connection establishment. Seconds are very long time for 
computer systems, and this is a major drawback in applying Bluetooth technology for 
a number of future dynamic distributed systems. 

Connection times can be lengthy, as transmitters and receivers need to synchronize 
before communication can commence. These limitations would have serious 
consequences if the wireless link were of a critical nature - for example, a ‘panic 
button’, a life-dependant medical monitor, or an engine management system. When it 
comes to sensor networks, for fixed nodes connection establishment should not be a 
problem, since once the connection is established it remains active. The problem 
occurs for mobile sensors, when the long duration of the connection establishment 
and short range may mean that the mobile sensor will be out of reach before the 
connection is fully set up. 





218 



V. Rakocevic et al. 



3.3 Power Consumption 

The small range of Bluetooth networks enables Bluetooth devices to spend minimal 
power to communicate. In addition to this, Bluetooth devices can switch to one of the 
three low power modes to further decrease the power consumption. The low power 
modes can also be used to form a dynamic piconet with more than 7 slaves - some 
slaves can be in low power modes while 7 slaves are continuously active. 

Bluetooth specification defines three low power modes: Hold, Sniff and Park. Hold 
mode allows devices to be inactive for a single short period of time. Sniff mode 
allows devices to be inactive except for periodic sniff slots. Both Hold and Sniff 
modes are used in the cases when a node does not have continuous data to send, but 
does not want to disconnect from the network. Park mode is similar to sniff node, 
except that parked devices give up their active member address. A device in park 
mode does wake up periodically to listen to broadcast packets and it can be unparked 
in this instances. For additional power saving, Bluetooth specification also provides 
the option to switch to a less accurate low-power oscillator which then drives the 
Bluetooth clock in the low power mode. 

The existence of low power modes is essential for sensor networks, where the 
power consumption is critical. 



3.4 Scheduling and Packet Types 

Bluetooth supports two types of communication links - the Asynchronous 
Connectionless (ACL) links and the Synchronous Connection-oriented (SCO) links. 

The ACL link provides a packet- switched communication between a master and a 
slave when data arrives from the upper layers. Essentially, the data communication in 
Bluetooth is achieved over ACL links. On these links the master always transmits on 
even slots and the slave replies on odd-numbered slots, where a slot time is 625 jd.v . 

Master only transmits data when there is something to send. A slave may only 
transmit if it has been transmitted to. In the simulation analysis that will follow we 
assume that all the links are ACL. The wireless sensor network using Bluetooth will 
be data communication networks and will use ACL Bluetooth links. 

SCO links are essentially voice links. They provide reserved link bandwidth and 
regular periodic exchange of data in the form of reserved slots. If an SCO link is in 
operation, then that slave must be communicated with regularly according to the SCO 
repetition rate, T sco . 

Two packet types are used on ACL links - DM and DH packets. DM stands for 
Data Medium, these packets use FEC for error correction. DH stands for Data High 
rate. These connections do not use FEC. There are options to use 1-slot, 3-slot and 5- 
slot packets. Table 1 gives data rates for asymmetric communication between a 
master and a slave.In our analysis, we will be looking at the case of asymmetric data 
flow where the slaves are sending the data to the master, and all the packets will be 
single-slot DH1 packets. 

In a Bluetooth piconet, multiple master-slave connections are served at the master 
in a round-robin fashion. The slaves are polled one by one, regardless of their traffic 
rates and queue sizes. Such a scheme is only efficient in the low load environments or 
when the incoming traffic is symmetric in all slaves. The master will transmit to and 
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Table 1. Data rates and packet sizes 



Packet type 


Max payload (B) 


Max Asymmetric data 
rate FWD (Kbps) 


Max asymmetric data 
rate reverse (Kbps) 


DM1 


17 


108.8 


108.8 


DH1 


27 


172.8 


172.8 


DM3 


121 


387.2 


54.4 


DH3 


183 


585.6 


86.4 


DM5 


224 


477.8 


36.3 


DH5 


339 


723.2 


57.6 



receive from each of the slaves that are active at the time. If there is nothing to send, 
the master may either omit that slave or transmit a NULL packet. 

A lot of work is being done in the research community to design and implement 
new scheduling mechanisms. The schemes presented in research literature all have an 
objective to avoid empty slots in Bluetooth scheduling - i.e. to avoid the cases when 
the master is polling a slave that has no data to send. The schemes differ on the basis 
of whether the polling is done in cycles - whether each slave is guaranteed at least 
one polling packet in a cycle, or the cycle does not exist. 

For example, Capone et al [7] analyse a number of polling schemes in an attempt 
to find an optimal exhaustive scheme. The schemes they present differ in the process 
of identifying the next slave to be served. They use the queue lengths at both the 
master and the slaves to identify the next slave. Das et al in [8] analyse three different 
scheduling schemes based on a definition of a maximum time limit in which the 
master has to serve a slave. In their analysis, each slave has a polling interval and is 
served within that interval. The polling interval for each slave is dynamic - it is 
longer for slaves with empty queues and shorter for slaves with high traffic load. 
Lapeyrie and Turletti [9] analyse efficiency and fairness of polling schemes. They 
calculate the slave priority as a linear combination of the probability the queues are 
non-empty and the number of slots since the slave has last been polled. 

All of these schemes have in common the need to prioritise among the slaves, and 
to use the queue length and traffic load to calculate the slave priority. Average waiting 
time (average packet delay) is the main metric used for the evaluation of these 
schemes. With respect to the analysis given in section 2, we can say that in wireless 
sensor networks the network Quality of Service should not be assessed on the basis of 
average packet delay but rather on the transfer delay of a sequence (burst) of packets. 
This is the main reason why in this paper we experiment with schemes that are 
optimised for dealing with bursts of data. 

In the remainder of this section, we present three schemes that are used in the 
simulation. In the network model used for our simulation, sensor nodes are modelled 
as Poisson sources of constant-length bursts of packets. 

Scheme 1: Exhaustive scheduling (ES). In this scheme, the master polls a slave for 
as long as there is data in the slave’s buffer. After that, the master moves on to serve 
the next slave in the cycle. The order of slaves is always the same. This is a standard 
polling scheme and we are using it to assess the benefit of introducing scheduling 
schemes optimised for constant-length burst traffic. 

Scheme 2: Limited exhaustive scheduling (LES). In this scheme, master polls the 
slave for as long as the slave buffer contains packets belonging to one data burst. 
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Once a burst is served, master moves to the next slave in the cycle and, in the case of 
a non-empty queue, serves the next-in-line burst from that slave. We expected this 
scheme to outperform the exhaustive scheme in terms of fairness in the presence of 
asymmetric traffic load. 

Scheme 3: Maximum burst delay first (MBDF). In this scheme the master serves 
the slave that has a data burst that has been waiting the longest. The master serves that 
slave until the burst of data is transferred, recalculates the delays and starts serving the 
slave with the maximum burst delay. This is different from Scheme 2 because the 
slaves are not being served in a cycle, but there is a recalculation of the waiting time 
after every data burst. We expected this scheme to outperform the other two. The only 
problem with this scheme is the implementation, since the master can know the burst 
arrival times only in the case when the queues are never empty. If one queue becomes 
empty, there is no way master can find out about the arrival of a new data burst. The 
MBDF scheme as defined here is therefore ideal and will have to be modified to be 
successfully implemented in a real system. However, the scheme is very important in 
the QoS provision in sensor node since it is based around the idea that the priority in 
the service order should be given to the sensor node with the highest probability of the 
data being flushed (arriving too late, becoming old and useless). 



4 Simulation Results 

We used discrete event simulator written in C++ to evaluate the performance of the 
three schemes. The simulated network consists of 7 slaves that generate the traffic in 
bursts of 50 DH1 packets. We assume that we are dealing with a network of 
temperature sensors or surveillance cameras that send periodic fresh information 
about the process or phenomenon monitored. The bursts are generated according to a 
Poisson process. The master does not have any of its data to send, it is only polling 
the slaves and receiving data from them. We use simple independent error model in 
which bit errors happen at random with probability P e . Das et al in [8] present a more 
accurate two state Markov error model, but in the preliminary simulations given here 
a simpler model has been used. 

In terms of performance evaluation metrics, in accordance to the analysis given in 
section 2, we are looking at per-burst performance and use average burst delay as the 
main performance parameter. Another parameter we use is the flush rate - the 
percentage of bursts that arrived at the master with the overall queuing and 
transmission delay greater then some Tmax, where we have used 0.5sec and 1 sec as 
values for Tmax in our simulations. It is critical for wireless sensor network that fresh 
information arrives at the end-hosts. The information about the flush probability is 
very important for assessing network availability in the sensor systems. If flushing of 
data happens often, the Quality of Service in the system decreases. 

The simulation experiments observe the cases of symmetric and asymmetric traffic 
load. The traffic load in the experiments is high, which creates the environment where 
scheduling becomes critical. In the first experiment (Fig. 2 and Fig. 3), the average 
traffic load is 0.8 in all slaves, the bit error probability varies from 10" 2 to 10' 5 , and the 
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flush time Tmax is lsec. In the second experiment (Fig. 4 and Fig. 5), the traffic load 
remains symmetric, but varies from 0.6 to 0.9 with constant error rate of 10" 3 , and 
Tmax=0.5sec. In the third experiment (Fig. 6 and Fig. 7) one of the slaves has load of 
0.9 while other slaves have load of 0.6, the error rate varies and Tmax=0.5sec. 

For the asymmetric load experiment, a metric called fairness index is used to 
measure the performance of the slave with asymmetric load. For example, let the 
average burst delay for slave i be d t , i = . Then the fairness index for slave 1 

is 



/,-=!- 




( 1 ) 



Since the objective for both the average delay and the flush probability is to be 
minimal, the smaller the fairness index, the worse the performance. If the fairness 
index is negative, this means that the slave in question performs worse than the 
average. This means that the scheduling scheme ignores the high load of that slave 
and does not prioritise it in any way. 

Overall, the results show that, when it comes to average burst delay, exhaustive 
scheduling outperforms the other two schemes even though they are supposed to be 
designed for constant-length burst traffic. This happens in both asymmetric and 
symmetric loading. When it comes to the flush rate, the MBDF scheme is the best. 
The Limited exhaustive scheme gives better delay performance than the MBDF 
scheme for symmetric load. 

Fig. 6 and Fig. 7 show that Limited Exhaustive scheme performs poorly in the 
asymmetric load conditions. The other two schemes perform in a similar way, with 
MBDF better in the flush rate and Exhaustive scheme better in the average delay. It is 
important to note that both Exhaustive and MBDF scheduling generate positive 
fairness index, i.e. they both provide priority for the high-loaded slave. While this is 
just a consequence of the queue sizes for the Exhaustive scheme, the MBDF scheme 
clearly prioritises the high-loaded slave by serving it much more frequently than other 
slaves. This is a very interesting conclusion that deserves to be further investigated in 
the future. 

For symmetric load, the difference in the performance between the Exhaustive and 
Limited Exhaustive scheme is minimal. Additional experiments will show what is the 
impact of longer bursts on this. For symmetric load, MBDF scheme performs much 
worse than the other two. The only aspect of performance where the MBDF scheme 
shows good results is in giving more priority to the heavy-loaded slave. More 
experiments are needed to find out how relevant is this and how beneficial is it. 

Another interesting point that was noticed is that for very high symmetric load 
(>0.9), Limited Exhaustive scheme starts performing worse than the MBDF scheme, 
although in terms of the flush rate it remains dominant. 

Overall, the results are somehow surprising, since the ‘standard’ exhaustive 
scheme has outperformed the other two in our experiments. Additional work is 
required and will be done in the further evaluation of this conclusion. 
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Fig. 2. Average burst delay, symmetric Fig. 3. Percentage of flushed bursts (flush 
load 0.8 time lsec), load 0.8 
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Fig. 4. Average burst delay, symmetric Fig. 5. Flush rate, symmetric load, 
load Tmax=0.5sec 
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Fig. 6. Average delay fairness index. Fig. 7. Flush rate fairness index, asym- 
asymmetric load metric load 



5 Conclusion 

This paper analysed the use of Bluetooth in sensor networks. Bluetooth is not a clear 
solution for sensor networks, because of slow connection establishment, high 
complexity and high power consumption in the process of moving from a low-power 
mode to a connected mode. However, with some modifications Bluetooth can be used 
in sensor networks. The paper analysed in detail scheduling schemes that can be used 
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in Bluetooth-based sensor networks and showed some simulation results where the 
limited benefit of introducing sensor network - specific scheduling schemes was 
observed. Our preliminary conclusion is that the implementation of a scheduling 
scheme specially tailored for sensor network does not bring much benefit in terms of 
QoS. It is only in highly asymmetric loading environments that the new scheduling 
schemes proved to be beneficial. 
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Abstract. We present a novel distributed algorithm which provides QoS 
by only enabling free-of-cycles routes which are known to ensure network 
stability. Network stability is synonymous of QoS as allows to determin- 
istically bound maximum delays and backlogs. Cycles are avoided by 
disabling the use of pair of input-output links around a node (turns). 
This method improves network utilization compared to previous solu- 
tions, which avoids cycles by forbidding the use of whole links. 

The algorithm can be applied in joint with any routing algorithm as 
does not require knowledge of the whole network topology, reducing the 
communication overhead compared to centralized approaches. 

The performance of the proposed algorithm has been compared against a 
centralized optimization solution. Even though the centralized solution 
exhibits a lower percentage of prohibited turns, the difference is quite 
moderate. We have also shown how our protocol can be enhanced to be 
able to tolerate fail-stop node crashes without the necessity of having to 
start from the beginning. 



1 Introduction 

To provide Quality of Service (QoS) in packet switched networks, bounds on the 
maximum end-to-end delay have to be provided. It is known [1] that when no 
link is fully loaded, network stability (i.e. bounded backlogs at any node) may 
deterministically ensure maximum end-to-end delays and maximum backlogs 
(which avoids packet loss). 

In the last few years, much of the analysis of stability has been done by 
using an adversarial packet injection approach , which was initially proposed by 
Borodin et al. [2] and Andrews et al. [1]. The adversarial model is a worst case 
packet injection model in which the adversary injects packets and chooses the 
route for each packet. The adversarial model was an improvement to the use 
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of probabilistic packet injection schemas as reduces the whole set of possible 
injection scenarios. In a seminal paper, by using and adversarial model , An- 
drews et al. [1] provides a list of universally stable packet scheduling policies 
(i.e. policies that guarantee network stability if the traffic load at each link 
is lower or equal to the link’s capacity). They showed that policies such as 
Farthest-to-Go (FTG), Nearest-to- Source (NTS), Shortest-in- System (SIS) and 
Longest-in-System (LIS) are universally stable. In contrast, they also showed 
that packet scheduling policies like First-in- First- Out (FIFO), Last-in- First- Out 
(LIFO), Nearest-to-Go (NTG) and Farthest- from- Source (FFS) are not univer- 
sally stable. Furthermore, LIFO, NTG and FFS can be made unstable at arbi- 
trarily low load rates [2] . Recently, it has also been shown that the same applies 
to FIFO [3,4], the scheduling policy analyzed in this paper which is, by far, the 
most widely used to schedule packets. 

Based on the knowledge of the routes, Eclragiie et al. [5] shows that any 
work conserving scheduling policy (i.e. any policy that always schedule packets 
if there is anyone in the queue) is stable if the load rate is not bigger than L 
(where d is the largest number of links crossed by any packet). They also reduced 
that bound to jfj for system-wide time priority scheduling policies (i.e., policies 
under which, a packet arriving at a buffer at time t has priority over any other 
packet that is injected after time t). 

On the other hand and regarding the case of session oriented networks 
(i.e. networks where all packets belonging to the same session follow the same 
route), in [6] Andrews provides an example which shows that FIFO can be un- 
stable under the (a, p) -regulated session-model of Cruz [7]. However, it turns 
out that, in the case of session oriented networks, if the total load at each link 
is smaller than 1, then, any network where routes do not create cycles of inter- 
dependent packets is stable [7]. 

A technique used to guarantee that no cycles will appear consists of pro- 
hibiting the use of some network resources. A simple approach to transform a 
graph into a cycle-free graph is to construct a spanning tree and prohibit the 
use of links not belonging to the tree. Whereas a spanning tree keeps the graph 
connectivity, it is quite inefficient since it prohibits the use of whole links and 
links close to the root get congested. 

Another approach consists of using the up/down routing algorithm [8]. The 
up/down routing algorithm constructs a graph, based on a spanning tree, and 
instead of prohibiting the use of determined links, it prohibits the use of pair of 
links around a node (turn). This algorithm performs better than the spanning 
tree but also suffers of unbalancing the load; furthermore, the performance of 
the resulting topology depends on the initial spanning tree. 

Starobinski et al. [9] propose an optimization solution (called Turn- 
Prohibition TP) to break cycles by prohibiting the use of network turns . Their 
performance analysis shows that TP outperforms the spanning tree algorithm. 
Also, they show that the maximum number of prohibited turns in the network 
is a third of the total number of turns. However, implementing TP algorithm is 
centralized and requires knowledge of the whole network topology. 
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In this paper, we propose a new algorithm that guarantees that routes do not 
create cycles. Similar to Starobinski et al. [9], it breaks cycles by prohibiting the 
use of network turns. However and contrary to the TP algorithm, our algorithm 
(which we call Distributed Turn Prohibition DTP ) does not require that nodes 
has got whole knowledge of the network topology. Although TP performs better 
than DTP in terms of number of prohibited turns (which is foreseeable since one 
has a global knowledge of the network topology and the other not), we show that 
the difference is quite moderate (less than 5%). Furthermore, we also present an 
extension of DTP that tolerates network failures in a dynamic fashion. 

Our work is arranged as follows. In Section 2 we present the system model 
and the distributed turn prohibition (DTP) algorithm, as well as an application 
example. Section 3 provides a formal proof of correctness. In Section 4 we make 
a performance analysis and in Section 5 we present a self-configuring version 
of the DTP algorithm that tolerates network failures. Finally in Sections 6 we 
present our conclusions and point some future work. 



2 Distributed Turn Prohibition Algorithm (DTP) 

We model the network topology as a graph G = (V,E), composed by a set of 
vertices V and a set of edges E. They represent network nodes and bidirectional 
links. Thus, we use interchangeably the terms node and vertex, and the terms 
link and edge. 

We denote an edge between node n,; and n :j as e i :r A path is defined as a list 
of nodes {n,;, nj,rik • • • , n p }, such that e^j. ej t k, ••• G E. It is said that a packet 
follows a path with a cycle in G if the path contains at least twice the same edge. 
For instance, a packet following the path {0, 1,2,4, 0, 1} at the graph of Figure 
1, traverses twice edge eo,i- Note that a packet following a path can traverse 
several times the same node without creating a cycle. 

We say that packets or flows are interdependents if they share at least a 
link in its respective paths. A path of interdependent packets can be obtained 
following a packet’s path and enabling to switch to another packet’s path at the 
shared links of the interdependent packets. A set of interdependent packets can 
create a cycle of interdependent packets in G if, starting from a node belonging 
to the set of nodes traversed by one of those packets, it is possible to obtain a 
path of interdependent packets which contains at least twice the same link. For 
example in the graph of the Figure 1 a packet from flow F0 follows path {0, 1, 2}, 
a packet from flow FI follows path {1, 2, 3}, a packet from flow F2 follows path 
{2, 3, 4} and a packet from flow F4 follows path {4, 0, 1}. The four packets create 
a cycle of interdependent packets with path {0, 1, 2, 3, 4, 0, 1}. 

From now onwards, when we refer to cycles, we mean cycles composed by a 
single packet or cycles of interdependent packet flows. 

Note that the graph shown in Figure 1 is not unstable, although it contains a 
cycle. The existence of cycles does not imply network instability, as it has been 
shown that a ring topology is universally stable. The non-existence of cycles 
implies network stability [7]. 
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Fig. 1 . Cycle of interdependent packet flows the path which contains the cycle is 
{0, 1, 2, 3, 4, 0, 1}. There are four flows. Flow FO’s path is {0,1,2}, flow FI’s path is 
{1, 2, 3}, flow F2’s path is {2, 3, 4} and Flow F4’s path is {4, 0, 1}. 



A pair of input-output links around a node is called a turn. We represent 
a turn around rij by the tlrree-tuple (nj, rij, n^), where rij , rij , € V and 

e i,j, e j,k £ E. The prohibition of turn ( Uj,nj,nk ) means that no packet can 
be forwarded from node rij to node rik by means of node rij, or from node rik 
to node rij by means of node rij. That is, no packet can traverse link e^fc after 
traversing e.jj or, in the other direction, no packet can traverse link etj after 
traversing link 

An efficient approach for breaking cycles in a network is based on the pro- 
hibition of turns in opposite to the prohibition of full links. Next we provide a 
novel distributed algorithm which provides network stability by the prohibition 
of turns in a network topology. 



2.1 Distributed Turn Prohibition Algorithm 

In this section, we propose a distributed turn-prohibition algorithm, called DTP , 
that guarantees that (i) all cycles within the network will be broken and (ii) that 
the network will remain connected after algorithm completion. 

Our algorithm performs two independent tasks: (i) it builds a spanning tree 
ensuring that all nodes execute the algorithm, and, (ii) it forbids some turns 
so that the resulting network topology is free of cycles. In order to guarantee 
that nodes will remain connected, those forbidden turns cannot be present in 
the spanning tree. 

The DTP algorithm (see Figure 2) uses the data structures shown in Table 1 
and acts as follows: the “initiator” or ROOT node (which can be any node) issues 
a GO message to initiate a depth search. When a node receives a GO message 
for the first time, it sets the sender as its father and marks it as “explored”. 
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Table 1 . DTP algorithm data structures 



Variable 


Description 


Vi 


Set of nodes directly connected to node i. 


father i 


Variable which stores a node identity or the value 
ROOT for the algorithm’s initiator node. 


explored i 


Boolean variable. Takes the value true if i has already 
received a GO message, and false otherwise. 


irii 


Node that sent the last GO message to i. 


OUti 


Last selected node to forward the GO message from i. 


revisedNodesi 


Set of nodes that have sent or received a GO message 
from i. Initially 0. 


selectOut(S) 


Selects an element from the set of nodes S. 


backListi 


List of nodes. Initially 0. 


Q 


Ordered set of type {path, turn). Initially 0. 


update{Q, list-of _nodes, turn) 


Appends “ list _of _nodes” to the first element of the last 
element in Q, and “turn” to the second element of the 
last element in Q. 


newTopology{Q) 


Creates a topology with the information of the paths 
contained in Q. That is, the topology contains all the 
nodes and links present in Q’s paths. 


checkCycle(Q , L, i, n) 


Looks for cycles in newTopology(Q), not already broken 
by the turns belonging to Q, that connects the nodes in 
the list L-i-M -n, where M is a list of path that can be 
empty. It returns a set of type {path, turn), where each 
path exhibits a cycle broken by its associated turn. 



Furthermore, it updates the elements of the GO message and forwards it to one 
of its adjacent “non-explored” nodes. If it is not the first time it receives a GO 
message (a cycle has been detected), it replies with a BACK message. 

When a node receives a BACK message it forbids the turn formed by the 
node that sent him the BACK message and the node that sent him the last GO 
message. Then, it continues forwarding a GO message toward a link that has not 
been explored yet. When all links have been explored, the node sends back to 
its father a SEARCH message. 

A SEARCH message can be interpreted as the fact that the son has finished to 
explore its successors and returns the topology it has gathered. Such information 
is useful in order to check for possible cycles present in the topology recollected 
by its son. Once the node has checked the topology, and forbidden the cycles 
found, it continues with the transmission of GO messages, as explained above (in 
the case where there are still nodes to be explored), or sends a SEARCH message 
to its father (if all nodes have been explored). 

The algorithm ends when the initiator has explored all its successors. That 
is, when all links present in the topology have been explored. 

Regarding the number of messages required to fully complete the DTP al- 
gorithm, it linearly depends on the number of links of the network (i.e. the cost 
of DTP in terms of number of messages is 0(|£j)). This can be readily seen 
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initiate * 

father t 4— ROOT', 
explored t <— true; 

irii 4 — 0 ; 

outi 4 — Selecf.Out(Vi)’, 
revisedNodesi <— outi ; 
send i Message [GO. i, {{*}{0}}] to outi. 



reVmGOj (7. Q):: activated when node i receives 
the message [GO, 7, Q from node j 
<- j ; 

re vis erf Nodai « +— re vised ATodejt* U 
IF ( explored i= false) THEN 
capiored* < true; 
father i 4— j; 

6 a.cfcZ.isti 4— 7 ; 

outi <— select Out {V* - reviscdAfodcSi); 

IF (outi jk 0 ) THEN 

revisedNodesi 4— rei'isediVodcs* U outjj 
update (Q, i. 0 ); 

send message [GO. hackListi ■ i, Q] to ou/.j; 
ELSE 

send message [SEARCH, Q' to father t ; 

ELSE 

update (Q, i, 0); 
send [BACK, Q] to j. 



rcv.BACK j (Q):: activated when node i receives 
the message [BACK, Q] from node j 
forbid.turn (*«,-,», out,); 

•update(Q, 0, (mi,*, outi)); 

IF {Vi — revisedNodesi = 0) THEN 
IF (father t # ROOT) THEN 

send message [SEARCH. Q] to father •*: 

ELSE 

outi <— tfelcct.Out(Vi — revisedNodesi); 
revi$edNode$i <— revisedNodesi U out*; 

Q = QU(0,0>: 

update(Q, backLisli • i, 0); 

send message [GO, backLisli • i, Q] to outi; 

■rev.SE/XRCH j(Q):: activated when node i receives 
the message [SEARCH. Q J from node j 
IF (Vi - revisedNodesi = 0) THEN 
for all n £ (Vi - [father i , outi}) do 
T = check Cyr.Ie(Q, backLisli, i, n) 
for each < PATH, TURN >€ T do 
fo rb id.t u rn (7 U R N > ; 

Q = Q U (0,0); 

upda te(Q, PATH, TURN)', 

IF (fathcr i ^ ROOT) THEN 

send message [SEARCH, Q\ to father ^ : 

ELSE 

outi <— se/cciC>ut(Vi - revisedNodesi); 

revisedNodesi 4— re.vise.dN odes ± L J outi ; 

Q = QU (0,0); 

updatc(Q, backLisli • i, 0); 

send message [GO, backTAsti • i, Q] to out*; 



Fig. 2. DTP algorithm 



by observing Theorem 1 in the next section which shows that the maximum 
number of messages exchanged when applying the DTP algorithm is at most 
of 2 • \E\. Note that the TP algorithm really solves an optimization problem 
whose implementation requires a global knowledge of the system. Consequently, 
its implementation will require a higher number of messages than DTP . 



2.2 Example of Application of the DTP Algorithm 

To illustrate how the DTP algorithm works, we provide an example of appli- 
cation to the graph of Figure 3-a. We focus our attention on the messages ex- 
changed by nodes into the network. Note that the given algorithm behavior is 
one of all the possible ones. 

We can find the following cycles in the graph of Figure 3-a: {0,3,2, 0,3}, 
{0,3, 1,0, 3}, {0,3,2,1,0,3}, {0,3, 1,4, 0,3}, {0,1, 4, 0,1}, {0,1, 2, 0,1}, 

{0,4, 1,2, 0,1} and {1,3, 2, 1,3}. 

Let us assume that the algorithm is started by node 1. After updating its 
local data structures, it sends the message G0({1}, {(1, 0)}) to node 0, which will 
send the message G0({1, 0}, {({1, 0}, 0)}) to node 4. In turn, it will send message 
G0({1, 0,4}, {({1,0,4}, 0)}) to node 1. As node 1 has already been explored, it 
sends a BACK message to node 4. On the reception of such a message, node 4 
will prohibit the turn (0,4, 1); node 4 will not route any message received from 
link eo ,4 through link eqq , nor from link e^i to link eo, 4 . The prohibition of this 
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Fig. 3. Example of utilization of the DTP algorithm. The graph in a) can produce 
paths with cycles. After applying DTP algorithm the topology shown in b) is free of 
cycles keeping network connectivity. Curved lines with dots at both sides represents 
the prohibited turns around a node (each dot touches a link of the prohibited turn). 
For example it is not possible that a packet follows path {0,4, 1}. 



turn does not isolate node node 4 as it is still possible to receive packets targeted 
to it and send packets from it. 

For simplicity, from now onwards, we omit the data contained in the ex- 
changed messages. As node 4 has already explored all its links, it sends a SEARCH 
message to node 0 (who is its father). Node 0 will send a GO message to node 3. 
Node 3 will send a GO message to node 1. As node 1 has already been explored, 
it will reply to node 3 with a BACK message. On the reception of such a message, 
node 3 will prohibit the turn (0,3, 1). 

Node 3 continues the depth search by sending a GO message to node 2, which 
sends a GO message to node 0, that in turn responds (to node 2) with a BACK 
message. On the reception of such a message, node 2 prohibits turn (3,2,0). 

There are yet unexplored links in node 2. So, it sends a GO message to node 1, 
which responds with a BACK message, prohibiting turn (3,2,1). 

As node 2 has explored all its links, it sends a SEARCH message to node 3 (its 
father). The value of Q received by node 3 is Q = {< {1,0,4, 1}, {0,4, 1} >, < 
{1,0, 3,1}, {0,3,1} >,< {1,0, 3, 2,0}, {3, 2,0} >, < {1, 0, 3, 2, 1}, {3, 2, 1} >}. At 
the reception of the SEARCH message, node 3 executes function checkCycle(Q, < 
1,0 >,3,n) ,for n £ {1,2}. It has to find a cycle containing the pattern: P =< 
S x ■ 1 • 0 • 3 • M ■ n ■ S y >, where S x , S y and Al are any valid subpath obtained from 
Q’s paths and P is a valid path. In this case no cycle is found. Node 3 sends a 
sends a SEARCH message to node 0. 

Node 0 performs function checkCycle(Q , < 1 >,0,?r) for n £ {2,3,4}, detect- 
ing cycle {1, 0, 2, 1} which is avoided by the prohibition of turn (1, 0, 2). Following 
it sends a SEARCH message to node 1 which performs checks to test if there still 
are undetected cycles and concludes the algorithm. 

The algorithm has forbidden a total of 5 turns as shown in Figure 3-b. In 
this new topology cycles can not be formed and connectivity is kept. 
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3 Correctness Proof 

The DTP algorithm satisfies two important properties: (1) by forbidding turns 
it avoids cycles in the network and (2) the network remains connected after 
algorithm completion. 

To verify the above properties, we check some general considerations that 
the algorithm must verify. First, we prove that the algorithm is of the traversal 
kind described by G. Tel [10]. Then, it will be easy to prove that the algorithm 
execution defines a spanning tree in the system. In order to verify that the 
system remains connected, we prove that the spanning tree is always a way to 
transfer messages between the whole nodes of the system. That is, turns can not 
be formed by links of the spanning tree. 

Theorem 1 . The proposed algorithm is a traversal algorithm 

Proof: Because the GO message is sent at most once in each direction through 
each link, it is obviously sent at most 2- \E\ times before the algorithm terminates. 
Because each process sends the GO message through each link at most once, each 
process receives a BACK or a SEARCH message through each link at most once. 
Each time the GO message is held by a non-initiator p , node p has received 
the GO message once more than p has sent the message. This implies that the 
number of links incident to p exceeds the number of used links of p by at least 
one, so p does not decide, but forwards the GO message. It follows that the 
decision takes place in the initiator. It will be proved in three steps than when 
the algorithm terminates each node it has forwarded the GO message. 

1. All links of the initiator have been used once in each direction. Each link 
has been used by the initiator to send the GO message or to respond to 
such a message forwarded by one of its neighbors, otherwise the algorithm 
does not terminate. The initiator has received a SEARCH or a BACK message 
exactly as often as it has sent an GO message and it has sent a BACK message 
exactly as often as it has received a GO message; because it has been received 
through a different link each time, it follows that links have been used once 
in both directions. 

2. For each visited node p, all links of p have been used once in each direction. 
Assuming this is not the case, let us choose p to be the earliest visited node 
for which this property is not true. We observe that by point (1) p is not 
the initiator. By the choice of p, all links incident to f other p have been used 
once in each direction, which implies that p has sent the SEARCH message 
to its father. This implies that p has used all its links to send or receive the 
GO message; but as the algorithm terminates when the initiator receives the 
message and all its links have been explored, p has transmitted a message in 
both directions through all its links. This is a contradiction. 

3. All processes have been visited and each link has been used once in both 

directions. If there are unvisited nodes, there are neighbors p and q such 
that p has been visited and q has not been visited, contradicting that each 
link of p has been used in both directions. So, by point (2), all nodes were 
visited, and all links are used once in both directions. ■ 
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Traversal algorithms are one kind of wave algorithms [10] and verify some 
important properties. One of them shows that each computation of the algorithm 
defines a spanning-tree in the network topology, as shown in the following lemma. 
The root of the tree is the initiator, and each non-initiator p has stored its father 
in the tree in the variable father p at the end of the computation. 

Lemma 1. Let C be a computation with one initiator p and, for each non- 
initiator q, let father q be the neighbor of q from which q received a depth search 
message (a GO message in our proposal) in its first event. Then, the graph 
T = ( N , Et), with N the set of nodes and Et = {qr : g/pAr = father q } is a 
spanning tree directed towards p. 

Proof: The proof is equivalent to that presented in [10] by considering that the 
depth search message received in its first event is a GO message. ■ 

The proposed algorithm belongs to the class of traversal algorithms but we 
must prove that the algorithm breaks all cycles of nodes and that the graph 
remains connected. 

Theorem 2. The algorithm breaks all cycles of the system. 

Proof: As the proposed algorithm belongs to the traversal kind, theorem 1, the 
GO message will gather the whole network topology, which is backward transmit- 
ted by SEARCH messages to the root node following the spanning-tree defined 
by the nodes of the system, lemma 1. Each node of the network receives at least 
once the GO message and sends backward a SEARCH message. Before a SEARCH 
message is sent, the node has checked if it belongs to cycles — checkCycle() — in 
the network topology that connects the node with all its reachable nodes along 
the spanning tree. Each node, in case it detects the presence of cycles, will create 
turns ensuring that the node does not belong to any cycle of nodes. In addition, 
such new information is included in the SEARCH message backward transmitted. 

Each node sends a SEARCH message until it is received at the initiator of 
the computation and the computation terminates (theorem 1). By handling the 
SEARCH message at the initiator, it checks for cycles (checkCycle ()) and pro- 
hibits the cycles of nodes present in the network. The initiator will create turns 
for each cycle of nodes that were not previously detected by other nodes in the 
network, because the message contains the complete network topology and turns 
defined in the system. ■ 

Lemma 2. A node does not create turns between its father and son in the 
spanning-tree defined by the algorithm computation. 

Proof: For a given node, a turn is created after handling a BACK or SEARCH 
message. According to the algorithm specification it is impossible that a son 
sends a BACK to its father. SEARCH messages can only be sent by the sons 
of the node, but in any case the father of the node is not taken into account 
to search for cycles (see actions associated upon SEARCH reception) and it is 
verified for all the nodes of the system. ■ 
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Table 2. Percentage of prohibited turns between TP and DTP algorithms when vary- 
ing the number of nodes. The network degree is 4. 



Number 

of 

Nodes 


Percentage of 
prohibited turns of 
TP 


Percentage of 
prohibited turns of 
DTP 


Difference of 
percentages 
(%DTP - %TP) 


16 


18.7 


23.0 


4.3 


32 


17.0 


21.0 


4.0 


64 


17.4 


19.6 


2.2 


128 


16.8 


18.7 


1.9 


255 


16.7 


18.5 


1.8 



Theorem 3. The system remains connected 

Proof: By theorem 2, all cycles will be broken and by lemma 2, after algorithm 
execution for each node of a cycle there is at least one path to reach each node of 
the cycle. So, the only possibility to be unconnected is to have a node that has 
turns defined for all its links, which is impossible by lemma 2 and the theorem 
is verified. ■ 



4 Performance Evaluation 

In this section we compare the performance of the DTP and the TP algorithms 
by means of performing massive simulations. 

Regarding the topologies we consider in our analysis, we have used the GT- 
ITM graph generator [11,12] since it permits us to generate different models of 
graphs that match current Internet topologies. For any concrete simulation, we 
generate 99 different topologies (33 flat random, 33 hierarchical and 33 transit- 
stub) and average the obtained results. 

The simulation experiments compare the percentage of prohibited turns using 
the TP algorithm against the DTP algorithm. By forehand, we have in mind 
that TP algorithm will perform better than the DTP algorithm, since it has 
knowledge of the whole network topology. 

In our first experiment, we analyze how the increase of the number of nodes 
affects the percentage of prohibited turns between both algorithms. We have 
generated topologies with a fixed network degree (i.e. the average of the nodes 
degree) of 4. The number of nodes ranges from 16 to 255. Table 2 shows the 
obtained results. As it can be seen, the TP algorithm prohibits less than 5% 
fewer turns than the DTP algorithm, which is quite moderate. 

In a second experiment, the number of nodes has been fixed to 120 and the 
network degree has been varied from 4 to 10. The results in Table 3 show that, 
when increasing the network degree, the number of prohibited turns increases 
in both algorithms. This is clear since, having a high network degree, increases 
the number of turns in the system and consequently the number of cycles. The 
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Table 3. Percentage of prohibited turns between TP and DTP algorithms varying 
network degree. Number of nodes fixed to 120. 



Network 

degree 


Percentage of 
prohibited turns of 
TP 


Percentage of 
prohibited turns of 
DTP 


Difference of 
percentages 
(%DTP -%TP) 


4 


16.7 


18.7 


2.0 


6 


20.0 


22.4 


2.4 


8 


24.5 


27.5 


3.0 


10 


25.0 


28.4 


3.4 



results show that, even though the TP algorithm prohibits less turns than the 
DTP , the difference is also quite moderate (less than 4%). 

5 Self— Configuring Distributed Turn Prohibition 
Algorithm SDTP 

One important issue in computer networks is the capability to dynamically adapt 
to the failure of nodes or links. In this section, we describe how the DTP algo- 
rithm can be enhanced to tolerate fail-stop node crashes without the necessity 
of having to start the algorithm from the beginning. 

1. When a node handles a GO message for the first time labels itself in an 
ordered fashion. 

2. If a node with label l fails (in a fail-stop fashion) do: 

a) Mark as “invalidated” all nodes with a label higher than l. 

b) Look for the node with the highest label, lower than l, linked with any 
of the “invalidated” nodes. Apply DTP starting at this node. 

c) If there are still “invalidated” nodes, repeat the previous step until there 
are not “invalidated” nodes (connected with “non-invalidated” nodes). 

The correctness proof of this new algorithm is almost the same as in the 
DTP (so, we omit it). Due to the node failures it may happen that nodes get 
isolated. Clearly, in that case it is not possible to guarantee that the network 
will remain connected. 

In Figure 4, we present an application that shows how SDTP behaves. For 
the sake of clarity, we have labeled each vertex with two labels, the original 
vertex number and the label given by the SDTP algorithm as a letter (letters 
are in alphabetical order). 

In a similar way, one can provide an algorithm to add new nodes (or sub- 
graphs) to a SDTP conformant graph. An intuitive first approach will consist 
in: (1) determine which is the node with the lowest label sharing a link with any 
of the new nodes, (2) invalidate all the nodes with label equal or higher that this 
node and (3) apply SDTP from the above mentioned node. 
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a) b) 

Fig. 4. Example of the behavior of the SDTP algorithm. Figure 4. a represents the 
topology of Figure 3.b. In case of failure of the node 3/D, invalidate node 2/E (its 
label is higher than node 3/D). Since the node with the highest label connected to 
node 2/E is node 0 /B, apply SDTP starting from node 0/B. The resulting topology 
after recovery from node 3/D failure is showed in Figure 4-b. Note that the label given 
to node 2 has changed to 2/D. 



6 Conclusions and Future Work 

We have presented a novel distributed algorithm which guarantees network sta- 
bility by means of forbidding turns without loosing network connectivity. The 
performance of the algorithm has been compared against the TP protocol and 
we found that the difference between them (in terms of prohibited turns) is quite 
moderate. However, on the contrary than TP whose implementation requires a 
global knowledge of the system, DTP is a fully distributed algorithm that does 
not require a global knowledge of the network. Finally, we have enhanced the 
DTP protocol to tolerate fail-stop node crashes without the necessity of having 
to start the algorithm from the beginning. 

At this moment we are working on a version of the algorithm which permits 
multiple initiator nodes. 
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Abstract. Multihoming is currently widely adopted to provide fault tolerance 
and traffic engineering capabilities. It is expected that, as telecommunication 
costs decrease, its adoption will become more and more prevailing. Current 
multihoming support is not designed to scale up to the expected number of 
multihomed sites, so alternative solutions are required, especially for IPv6. In 
order to preserve interdomain routing scalability, the new multihoming solution 
has to be compatible with Provider Aggregatable addressing. However, such 
addressing scheme imposes the configuration of multiple prefixes in 
multihomed sites, which in turn causes several operational difficulties within 
those sites that may even result in communication failures when all the ISPs are 
working properly. In this note we propose the adoption of Source Address 
Dependent routing within the multihomed site to overcome the identified 
difficulties. 



1 Introduction 

Since the operations of a wide range of organizations rely on communications over 
the Internet, access links are a critical resource to them. As a result, sites are 
improving the fault tolerance and QoS capabilities of their Internet access through 
multi-homing, i.e. the achievement of global connectivity through several connections 
supplied by different Internet Service Providers (ISPs). However, the extended usage 
of the currently available IPv4 multi-homing solution is jeopardizing the future of the 
Internet, since this use has become a major contributor to the post-CIDR growth in the 
number of global routing table entries [1]. Therefore, in IPv6 the usage of Provider 
Aggregatable (PA) addressing is recommended for all sites, included multihomed 
ones, in order to preserve inter domain routing system scalability. While such 
addressing architecture reduces the amount of routing table entries in the Default Free 
Zone of the Internet, its adoption presents a fair amount of challenges for the end- 
sites, especially for those who multihome. Essentially, when PA addressing is 
adopted, a multihomed site will have to configure multiple addresses, one per ISP, in 
every node of the site, in order to be reachable through all its providers. Such 
configuration pose quite a few number of challenges for its adoption, since current 
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hosts are not prepared to deal with multiple addresses per interface as it is required. In 
this note, we will present how Source Address Dependent (SAD) routing can be 
adopted to deal with some of the difficulties present in this configuration. 

The rest of this paper is structured as follows: First we will present the rationale for 
adopting SAD routing within multihomed sites. Then, we will detail the different 
configurations of SAD routing that may be required in different sites, including some 
trials performed, and next we will present the capabilities of the resulting 
configuration. Finally we will present the conclusions of this work. 



2 Rationale 



2.1 Current IPv4 Multihoming Technique and Capabilities 

As mentioned above, a site is multi-homed when it obtains Internet connectivity 
through two or more service providers. Through multi-homing an end-site improves 
the fault tolerance of its connection to the global network and it can also perform 
Traffic Engineering (hereafter TE) techniques to select the path used to reach the 
different networks connected to the Internet. 

In IPv4, the most widely deployed multi-homing solution is based on the 
announcement of the site prefix through all its providers. In this configuration, the site 
S obtains a Provider Independent (PI) prefix allocation directly from the Regional 
Internet Registry. Then, the site announces this prefix to its providers using BGP [2], 
Then the multihomed site providers announce the prefix to its own providers and so 
on, so that eventually the route is announced in the Default Free Zone. 

This mechanism provides fault tolerance capabilities, including preserving 
established connections throughout an outage. In addition, the following TE tools are 
available to the multihomed site: The multihomed site can define which one of the 
available exit paths will be used to carry outgoing traffic to a given destination by 
proper configuration of the LOCAL_PREFERENCE attribute of BGP [3]. For 
incoming traffic, the multihomed site can influence the ISP through which it prefers 
to receive traffic by using the AS prepending technique, which consists in artificially 
making the route through one of the providers less attractive to external hosts by 
adding AS numbers in the AS_PATH attribute of BGP [3] (it should be noted that in 
this case, the ultimate decision of which ISP will be used to forward packets to the 
site belongs to the external site that is actually forwarding the traffic). 

While the presented IPv4 multihomed solution provides fairly good features 
regarding to fault tolerance and TE, it presents very limited scalability properties with 
respect to the interdomain routing system. Because of the usage of PI addressing by 
the multihomed sites, each multi-homed site using this solution contributes with 
routes to the Default Free Zone routing table, imposing additional stress to already 
oversized routing tables. For this reason, more scalable multi-homing solutions are 
being explored for IPv6 [4], in particular solutions that are compatible with the usage 
of PA addressing in multihomed sites, as it will be presented next. 
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2.2 Provider Aggregation and Multi-homing 



In order to reduce the routing table size, the usage of PA addressing is required. This 
means that sites obtain prefixes which are part of their provider’s allocation, so that its 
provider only announce the complete aggregate to their providers, and they do not 
announce prefixes belonging to other ISP aggregates, as presented in figure 1 . 



"Oir ; 

PA(BGP)^-^,^ jnternet J 

^ Link 1 Link 2 

,- r ISPA 
Prefix: PA 




PB(BGP) 







Link 3 



ISPB 

Prefix: PB 



SiteS PASi,e:H1 / Link4 
PBSite:HV 



Fig. 1. Provider aggregation of end-site prefixes 



When provider aggregation of end-site prefixes is used, each end-site host interface 
obtains one IP address from each allocation, in order to be reachable through all the 
providers and benefit from multi-homing capabilities (note that ISPs will only 
forward traffic addressed to their own aggregates). 

This configuration presents several concerns, as it will be presented next. 

• Difficulties in the communication in case of failure. When Linkl or Link 3 becomes 
unavailable, addresses containing the PASite prefix are unreachable from the 
Internet. 

• Ingress filtering [5] is a widely used technique for preventing the usage of spoofed 
addresses. However, in the described configuration, its usage presents additional 
difficulties for the source address selection mechanism and intra-site routing 
systems, since the exit path and source address of the packet must be coherent with 
the path, in order to bypass ingress filtering mechanisms. 

• Established connections will not be preserved in case of outage. If Link I or Link3 
fails, already established connections that use addresses containing PASite prefix 
will fail, since packets addressed to the PASite aggregate will be dropped because 
there is no route available for this destination. Note that an alternative path exists, 
but the routing system is not aware of it. 

The presented difficulties show that additional mechanisms are needed in order to 
allow the usage of PA addresses while still provide incumbent multi-homing solution 
equivalent benefits. In this note, we will explore the possibility of using Source 
Address Dependent routing as a tool to help to overcome the identified difficulties. 



3 Source Address Dependent (SAD) Routing 

Source Address Dependent (SAD) routing essentially means that routers maintain as 
many routing tables as source address prefixes involved, and packets are routed 
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according to the routing table corresponding to the source address prefix that best 
matches the source address contained in the packet header. 

SAD routing can be used to provide ingress filtering compatibility for routing 
packets flowing from the multihomed site to the Internet. In this case, the source 
address of the exiting packets has been determined by the host that initiated the 
communication (the host in the multihomed site, or the external host through the 
selection of the destination address of the initial packet) and then the routing system 
will forward the packet to the appropriate exit router in order to guarantee ingress 
filtering compatibility. The source address selection determines the ISP to be used for 
routing packets, since, because of address filtering, the source address determines the 
forward path from the multihomed site to the rest of the Internet, and it also 
determines the ISP to be used in the reverse path, since the source address used in the 
initial packets will become the destination address of the reply packets. 

Since source address selection implies ISP selection, the adoption of SAD routing 
will also affect the mechanisms to be used in multihomed sites to define TE. In 
particular, it will shift TE capabilities from the routing system to the hosts themselves. 

We will next evaluate the adoption of SAD routing in two typical multihomed 
configurations: sites running BGP but without redistributing the BGP information into 
an IGP, and sites running an IGP to select the exit path. There is an additional 
possible configuration using static routes in the multihomed site. However, this last 
configuration is fairly simple and several commercial routers already support it, so we 
won’t provide a full description of it. Nevertheless, it should be noted that when SAD 
routing is used, it is possible to obtain fault tolerance and TE capabilities without 
requiring dynamic routing, since those features are now supported by the hosts 
themselves and not by the routing system. 

In order to enable SAD routing within a site, SAD routing support is not required 
in all the routers within the site, but it has to be adopted in a connected SAD routing 
domain that contains all the exit routers [6], as presented in the figure below. 




Note that it is not necessary for the generic routing domain to be connected, i.e. it can 
be formed by a set of disconnected domains, all connected to the SAD routing 
domain. 
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3.1 Sites Running BGP but Without Redistribution of BGP Information into 
IGP 

Current IPv4 multihomed sites usually run BGP with their providers. Through BGP, 
they obtain reachability information from each of their ISPs. However, because of 
operational issues, some sites do not redistribute the information obtained through 
BGP into the IGP [3]. So, in order to be able to properly select the intra site path 
towards an external destination, they include all the routers that are required to 
properly select the exit path in the IBGP mesh, including not only site exit routers, but 
also other internal routers that have access to multiple exit routers. This means that 
the IBGP cloud is wrapping the non-BGP aware routing domain, as presented in 
figure 3. 




It should be noted that only the IBGP mesh must be connected, and that the non- 
BGP aware region may be formed by multiple disconnected domains, only linked by 
the IBGP domain. It is clear that only the routers included in the IBGP mesh need to 
implement SAD routing in order to properly select the site exit path. So, since all 
these routers are running BGP, we can use BGP capabilities to provide SAD routing 
support. 

In order to implement SAD routing, each exit router that is running EBGP has to 
attach a color tag to the routes received from the ISP, so that it is possible to identify 
the routes learned through each different ISP. Additionally, once the routing 
information is colored, it is necessary to map each of the colors to a source address 
prefix. Once that the information of both a given color and its correspondent prefix is 
available, it is possible to construct SAD routing tables, containing routing 
information per source prefix. 

SAD routing can be implemented in this scenario using the BGP Communities [7] 
attribute to color the routing information. So, we assume a multihoming scenario 
where a multihomed site has n ISPs, each one of them has assigned Pref_i to the 
multihomed site, with i=l,..,,n. In order to adopt SAD routing it is required that: 

- First, a private community value is assigned to each different ISP. Therefore, 
Com_i value is assigned to the routes obtained from ISPi, being i=l,..,n 

- Second, n routing tables are created in each of the routers involved, so that each 
router has one routing table per prefix in the site (i.e. per ISP). Additionally each 
router is configured to route packets containing a source address matching Pref_i 
using the routing table i. 
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- Third, BGP processing rales are configured in each router, so that routes 
containing a community attribute value equal to Com_i only affect routing table i. 

- Finally, each exit router that is peering with an external router in ISPi is configured 
to attach the community value Com_i to all the routes received from ISPi , when 
announcing them through IBGP. 

The resulting behavior is that each router within the IBGP mesh will have separate 
routing tables containing the information learned through each ISP. Packets 
containing a source address with the prefix of the ISPi will be routed using the 
corresponding routing table. 



3.2 Sites Using IGP 

In this scenario, the multihomed site is using an IGP to inform about both internal and 
external destinations. The IGP learns about external destinations in one of the 
following three ways: 

- Manually configured routes are imported into the IGP 

- BGP redistribution into the IGP 

- IGP exchange with the providers 

As in the previous case, the whole multihomed site routing system is not required to 
support SAD routing but only a connected domain that has to contain all the exit 
routers. However, while BGP provides mechanisms to tag routing information so that 
the same protocol instance can be used to propagate information with different 
scopes, as presented in the previous section, current IGPs do not provide such 
capability. 

In order to provide SAD routing support, different instances of the routing protocol 
run in parallel, each one of them associated with a source address prefix. In this way, 
different instances of the IGP will update different routing tables within the routers. 
The main difficulty with this approach is to differentiate messages corresponding to 
the different instances of the IGP. Normally, different instances of the IGP run in 
different interfaces, so that each instance only receives its own messages. But in this 
case we want to ran multiple instances of the IGP in the same interfaces, so we need a 
way to separate messages according to the instance of the IGP they belong to. 

A possibility would be to send IGP messages using global addresses as source 
addresses. Usually, IGP messages are sent using link local addresses. But, since each 
router can be configured with multiple IP addresses, one per prefix, the router 
includes different source addresses in the messages corresponding to different 
instances of the IGP. This ships-in-the-night strategy would allow each IGP instance 
to believe that they are running alone in the link 

In particular OSPF for IPv6 [8] explicitly supports running multiple instances in 
the same link and packets belonging to different instances are identified using the 
Instance_ID field in the OSPF header. 



3.3 Experimenting with SAD Routing 

We will next analyze the deployability of the approach by evaluating the available 
support for SAD routing in current implementations. In order to asses the deployment 
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effort required to adopt the proposed solution, we have built a testbed with widely 
available commercial routers and we have performed some trials in the framework of 
the Optinet6 research project. The testbed evaluated the capabilities to support SAD 
routing of Cisco 2500 routers, Cisco 7500 routers and Juniper M10 routers. 

All the tested routers support static SAD routing, i.e. routing based on the source 
address of the packets according to statically-defined routes. However, the 
implementation of the SAD routing support differs considerably between them. Cisco 
IOS supports static SAD routing through manually defined rules, called route-maps, 
that affect the processing of packets. In order to enable SAD routing, route-maps 
corresponding to each source address dependent route have to be defined. On the 
other hand, Juniper routers support multiple routing tables, so that it is possible to 
create as many routing tables as source address prefixes are involved, and then define 
the required rules so that the router will forward packets according to the routing table 
associated with the prefix contained in the source address. In the case of static SAD 
routing, the multiple routing tables are configured manually with the desired static 
routes. 

Regarding dynamic SAD routing, the support provided by Cisco routers is very 
limited. Because SAD routing is supported as a manually defined route-map, and 
because route-map definition is mainly a manual process performed by the router 
operator, Cisco routers cannot update the routing information (i.e. route-maps) 
involved in the SAD routing. This means that neither the BGP nor the IGP case are 
supported by this router vendor. 

Because Juniper routers support multiple parallel routing tables, the support for 
dynamic SAD routing is provided more naturally. In the case of BGP, it is needed that 
different routing tables are updated depending on the values of the community 
attribute contained in the BGP route. While this seems pretty straightforward, it is not 
currently supported by Juniper routers because of the existent constraint that imposes 
that a given instance of a routing protocol can only update a single routing table, 
making not viable that the BGP instance can update different routing tables based on 
the value of the community attribute. Such limitation does not apply for the IGP case, 
since the considered approach proposes the usage of multiple instances of the IGP 
running simultaneously, one per source prefix involved, and that each instance of the 
IGP updates its corresponding routing table. This configuration is currently supported 
in Juniper routers for OSPFv2 and also for BGP. It should be noted that this approach, 
i.e. running multiple instances of BGP in parallel, can be used as a temporary solution 
for the BGP case while the community based approach is not available. 



4 Resulting Capabilities 



4.1 Fault Tolerance Capabilities 

Since the basic assumptions behind adopting SAD routing for multihoming support 
are that the source address is determined by the initiating host, and that each source 
address prefix determines an exit ISP, fault tolerance capabilities will be provided by 
the hosts themselves. As described in the Host Centric Approach [6], such 
mechanisms are based on a trial and error procedure. Considering that each source 
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address available in a host is bound to an exit path, the host can try different exit paths 
by changing the source address. The main difference between the approaches is how 
fast the host can learn that a destination address is unreachable through the selected 
path. 

When external routes are static, the intra site routing system has no external 
reachability information, so the packet will be forwarded outside the site and only 
when it reaches routers that have richer knowledge about the topology it will be 
possible to determine whether the requested destination is reachable through the 
selected path. In the worst case, the initiating host will timeout and will retry with a 
different path. 

When the multihomed site runs BGP or an IGP with its providers, reachability 
information is available closer to the host, i.e. in the site's routers, so in some cases, 
unreachability will be discovered faster than in the general case, where unreachability 
information is learned through timeouts. So, the host will attempt to use one of its 
source addresses to reach a certain destination. The packet will be routed through the 
generic routing domain to the SAD routing domain. Once there, the routers will 
determine whether the selected destination is reachable with the selected source 
address. This means that a route to the selected destination exists in the routing table 
associated with the selected source address prefix. The possible resulting behaviors 
are: 

• If the selected destination is reachable through the selected source address, then the 
packet is forwarded towards the site exit router that leads to the ISP corresponding 
to the source address prefix selected. 

• If the selected destination is not reachable through the selected source address, but 
it is reachable through an alternative source address, then the packet is discarded 
and an ICMP Destination Unreachable with Code 5 which means Source Address 
Failed Ingress Policy [9] is sent back to the host. The information about the proper 
source address prefix can be included in this message, for instance in the source 
address of the ICMP message. The host will then retry using the suggested source 
address. 

• If the selected destination is unreachable, the packet is discarded and an ICMP 
Destination Unreachable is sent back to the host. In this case, the host may retry if 
an alternative destination address is available. 



4.2 Traffic Engineering (TE) Capabilities 

As a consequence of using multiple prefixes in multihomed sites in conjunction with 
SAD routing, the party selecting the address of the multihomed host to be used during 
the communication is the party that determines the ISP to be used for the packets 
involved in this communication. So, TE mechanisms will have to influence such 
selection. It must be noted, that the addresses used in a communication are determined 
by the party initiating the communication, so in this environment, policy mechanisms 
will not affect incoming and outgoing traffic separately as in the IPv4 case, but they 
will affect packets belonging to externally initiated communications and packets 
belonging to internally initiated communications differently. This is the first 
difference with the previous case. 
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4.2.1 TE for Externally Initiated Communications 

When a host outside the multihomed hosts attempts to initiate a communication with a 
host within the multihomed site, it first obtains the set of destination addresses, then it 
selects one according to the Default Address Selection procedure [10]. It seems then 
that the only point where the multihomed site can express TE considerations is 
through the DNS server replies. The DNS server can be configured to modify the 
order of the addresses returned to express some form of TE constraint. 

This mechanism can work fine to provide some form of load balancing and load 
sharing. The DNS server can be configured so that x% of the queries are replied with 
an address with prefix of ISPA first and the rest of the times (100-x %) are replied 
with an address with prefix of ISPB first. In addition SRV [11] records can be used to 
provide enhanced capabilities by those applications that support them. When the host 
receives the list of addresses, it will process them according to RFC3484. If none of 
the rules described works, the list is unchanged and the first address received is tried 
first. Note that the list may be changed by the address selection algorithm because of 
the host policies. 

4.2.2 TE for Internally Initiated Communications 

For internally initiated communications, the exit ISP is determined by the source 
address included in the initiating packet. This means that the source address selection 
mechanism [10] will determine the exit ISP. RFC 3484 defines a policy table that can 
be configured in order to express TE considerations. The policy table allows a fine 
grained policy definition where a source address can be matched with a destination 
address/prefix, allowing most of the required policy configurations. 



5 Conclusions 

In this note we have presented the case for the adoption of SAD routing in 
multihomed environments. The scalability limitations of the current multihoming 
solution based on the usage of Provider Independent addressing have been largely 
acknowledged by the Internet community, and there is a consensus that only a new 
multihoming solution compatible with PA addressing will preserve IPv6 inter domain 
routing system scalability. However, the adoption of PA addressing in multihomed 
environments implies that multihomed sites need to internally configure as many 
prefixes as providers they multihome to, causing several difficulties, such as 
incompatibilities with ingress filtering, incapability to preserve established 
connections through outages and so on and so forth. This is basically due to the fact 
that when multiple PA prefixes are present in the multihomed site, the source address 
selection process determines the ISP to be used in the communication. This is so 
because in order to preserve ingress filtering compatibility, the packet has to be 
forwarded through the ISP that is compatible with the selected source address. 
Current destination address based routing does not take into account the source 
address of the packet, making it unsuitable to provide ingress filtering compatibility, 
that is source address related. SAD routing is then the natural option to overcome the 
difficulties caused by ingress filtering. Moreover, once that SAD routing is available 
on the multihomed site, it is possible to obtain additional benefits such as fault 
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tolerance and traffic engineering capabilities with a reduced complexity. SAD routing 
is not a new technology and it is available in some form in current router 
implementation, which facilitates its adoption and deployment. However, SAD 
routing is currently a special feature whose applicability was limited to very specific 
scenarios. But, if SAD routing is adopted as a fundamental part of the IPv6 
multihoming solution as it proposed in this note, it would imply a massive adoption of 
SAD routing technology, based on the expected number of multihomed sites. 
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Abstract. This paper considers the problem of service differentiation in an 
optical packet switched backbone. We propose and analyze a QoS routing 
approach based on different routing and congestion management strategies for 
different classes of service. Congestion resolution is achieved by using the 
wavelength and time domain and QoS differentiation in the single node is 
achieved by resource reservation in the wavelength domain. This is combined 
with alternate routing at the network level. In the paper we show that the 
proposed strategy guarantees very good performance to the high priority traffic 
with very limited impact on low priority traffic. 

Keywords: Optical packet switching, adaptive routing, optical buffer, QoS. 



1 Introduction 

One of the emerging needs in present day networking is the support of multimedia 
applications, which demands real time information transfer with very limited loss to 
provide the end-users with acceptable quality of service (QoS). At the same time 
economics require an efficient use of the network resources. 

Assuming that internetworking will be provided by the IP protocol, and accounting 
for its inability to manage QoS, techniques for QoS differentiation must be 
implemented in the transport networks. Significant effort has been developed to 
define QoS models. In backbone networks the most interesting solutions proposed to 
solve the QoS problem deal with a limited number of service classes collecting 
aggregates of traffic flows with similar requirements. This approach can greatly 
improve scalability and reduce the operational complexity [1], 

In the near future high-capacity circuit-switched optical backbones will provide a 
huge bandwidth capacity but with limited flexibility in terms of bandwidth allocation 
and QoS management. Optical burst switching (OBS) and optical packet switching 
(OPS) are respectively a medium and a longer term networking solutions that promise 
more flexibility and efficiency in bandwidth usage combined with the ability to 
support diverse services [2], 

In this paper we will focus on an OPS network even though, as explained in the 
following, the results presented here may be meaningful also with reference to OBS 
networks that implement queuing at intermediate nodes. 
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Aggregate QoS solutions such as DiffServ are a viable approach also for OPS 
networks although, because of the limitations of the optical technology, the number of 
QoS classes must be kept small to minimize operational efforts. In fact, complex 
scheduling algorithms are not applicable because of the peculiarity of queues in the 
optical domain, which usually provide a very limited queuing space being 
implemented by means of delay lines that do not allow random access [3], This means 
that traditional priority-based queuing strategies are not feasible in OPS network, and 
QoS differentiation can be achieved only by means of resource reservation strategies. 

We have shown in previous works that QoS differentiation in an OPS network can 
be provided with good flexibility and limited queuing requirements by means of 
resource reservation both in the time and wavelength domains [4][5], These works 
deal with algorithms for QoS management implemented at the switching node level. 
Other opportunities arise when considering the routing decisions. 

In this paper we extend the study of QoS differentiation mechanisms at the 
network level, by investigating how the network topology properties can be exploited 
together with suitable QoS algorithms in order to differentiate the quality along the 
network paths. 

First, the QoS management issues in standalone optical packet switches are 
reviewed in section 2. Then the QoS management approach for the whole network is 
described in section 3. The same approach is then applied, in section 4, to a sample 
network to analyze the influence of different alternatives. In section 5 a network 
design procedure given the topology and the traffic matrix is presented and, finally, in 
section 6 some conclusions are drawn. 



2 QoS Management in OPS Networks 

We assume a network capable of switching asynchronous, variable-length packets or 
bursts. Therefore the results presented in the following may refer both to an OBS 
network implementing queuing in the nodes [6] or to an OPS network [7], In the 
following we will generally refer to an OPS network and assume that two classes of 
traffic exist, namely high priority (HP) and low priority (LP). 

We consider optical switches that resolve congestions by means of the wavelength 
and time domains. We do not deal with switching matrix implementation issues and 
consider a general switching node with N input and N output fibers, carrying W 
wavelengths each. The switch control logic reads the burst/packet header and chooses 
the proper output fiber among the N available. 

Packets contending for the same output are multiplexed in the wavelength domain 
(up to W packets may be transmitted at the same time on one fiber) and, if necessary, 
in the time domain by queuing, implemented with fiber delay lines (FDLs). The FDL 
buffer stores packet waiting to be transmitted but does not allow random access to the 
queue. Therefore the order of packets outcoming from the buffer can not be changed 
and priority queuing is not applicable. Thus QoS management must rely upon 
mechanisms based on a-priori access control to the optical buffers. 

In general, after the output fiber has been determined, the switch control logic must 
face a two-dimensional scheduling problem: choose the wavelength and, if necessary, 
the delay to be assigned to the packet. This problem is called the wavelength and 
delay selection (WDS) problem. An optimal solution to the WDS problem is hardly 
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feasible, because of computational complexity and heuristics have been proposed in 
the past [5] [8] [9]. Here we will use the minimum gap algorithm (MING) [8] that has 
been shown to realize a good trade-off between complexity and performance. This 
algorithm performs the wavelength assignment by selecting the queue which 
introduces the smallest gap between the new packet and the last buffered one. 

In this scenario QoS differentiation is achievable in the node by differentiating the 
amount of resources to which the WDS algorithms is applied. We have already shown 
in [5] that this can be done adopting either a threshold-based or a wavelength-based 
technique. In the former case, the reservation is applied to the delay units. The WDS 
algorithm drops incoming LP packets if the current buffer occupancy is such that the 
delay required is greater than or equal to the threshold, while HP packets have access 
to the whole buffer capacity. In the latter case the reservation is applied to the 
wavelengths. A subset of K<W wavelengths on any output fiber is shared between HP 
and LP packets while the remaining W-K wavelengths are reserved to HP packets. 
Generally speaking, wavelength reservation is more promising because of the larger 
amount of resources available that provide more flexibility to the algorithms. This is 
because WDM systems are continuously improving and the number of available 
wavelengths per fiber is getting larger and larger. On the other hand FDL buffers are 
bulky and should be kept as small as possible, therefore the number of delays that can 
be implemented is fairly limited and is probably not going to improve much in the 
future. 

The aforementioned approach provides QoS differentiation at the single network 
node, but does not tackle the problem at the whole network level. A further extension 
is to define QoS routing algorithms to obtain even further service differentiation by 
combining QoS management at the routing level with QoS management in the WDS 
algorithms. 

This paper assumes a meshed network topology and shortest path routing. Traffic 
is normally forwarded along the shortest path but alternate paths of equal or longer 
length are also identified and can be used. We define two possible routing strategies: 

- Single Link Choice (SL), that implements a conventional shortest path routing 
based on minimum hop count and do not use alternate paths; 

- Multiple Alternative (MA), besides the shortest path calculates alternate paths that 
are used by the network nodes when the link along the shortest path (also called 
default link) becomes congested. 

QoS management is achievable by differentiating the concept of congestion and/or 
providing different alternatives to LP and HP traffic. 

The proposal analyzed in this work is as follows. 

- The WDS algorithm works with wavelength reservation according to a partial 
sharing approach; H out of the W wavelengths available are reserved for HP traffic 
while the W-H remaining are shared between HP and LP traffic. Two options are 
considered: 

- The H reserved wavelength may be fixed, namely the W wavelengths available 
are ordered and the reserved wavelengths are X. t with i=l,...,H (FIX strategy). 

- Any H wavelengths are reserved based on the actual occupancy, namely when at 
least W-H wavelengths are available both LP and HP packets may be 
transmitted, otherwise when less than W-H wavelengths are available only HP 
packets can be transmitted (RES strategy). 
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- In the routing algorithms congestion is defined according to the wavelength 
occupancy to determine wavelength availability, when at least T out of W 
wavelengths are busy the fiber (and the path to which the fiber belongs to) is 
considered congested. The value of T is different for different classes of service; 
for LP traffic T LP = W-H < W, while for HP traffic T HP = W. This means that for the 
LP class congestion arises before and alternative path, if any, should be used more 
frequently. 

- Alternate routing is used for LP traffic but not for HP traffic. Therefore HP traffic 
is always routed with a SL choice, while LP traffic is routed with a MA choice, and 
alternate paths are used when congestion is present. 

The basic idea behind this approach is that the HP traffic stream should be preserved 
intact as much as possible. Congestion and alternate routing will modify the traffic 
stream, because of loss, delay, out of sequence delivery etc. Therefore we reserve 
resources to HP traffic to limit congestion phenomena and do not rely on alternate 
routing to avoid as much as possible out of order packets. 



3 Network Performance Analysis 

In this section we provide numerical results to evaluate that the proposed techniques 
for QoS management may provide service differentiation at the whole network level. 
Performance is evaluated in terms of packet loss probability. Due to lack of space 
evaluation of other performance parameters such as delay and out-of-order packet 
delivery are not shown here. Numerical results were obtained by using an ad-hoc, 
event-driven simulator. The reference network topology is shown in Fig. 1 and 
consists of 5 nodes interconnected by 12 fiber links carrying 16 wavelengths each. 
Traffic enters the network at any node and is addressed to any other node according to 
a given traffic matrix. 




Fig. 1. The reference network topology 

The network adopts a connectionless transfer mode, with traffic generated by a 
Poisson process. The packet size distribution is exponential with average value equal 
to the buffer delay unit D measured in bytes. This choice minimizes the packet loss at 
the node level when adopting the MING algorithm [8], The number of simulated 
packets is up to 10 s . The traffic matrix has been set up as follows, with two 
alternatives. 
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- Balanced traffic matrix (B): the traffic distribution in the network is uniform since 
each wavelength carries the same average load (80%). With this approach the input 
load at different ingress points of the network may clearly not be the same. 

- Unbalanced traffic matrix (U): in this case the traffic load at the ingresses of the 
network is assumed to be the same. By making this choice the links have a 
different average load per wavelength, with the only constraint that the maximum 
value cannot overtake a fixed value (80% in our simulations). 

Since in the balanced case each link is loaded in the same way, we can consider the 
average loss probability of the whole network as an evaluation parameter. On the 
other hand, in the unbalanced case this parameter may not be representative for 
performance evaluation, therefore the worst loss probability among all links will be 
taken into account. 
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Fig. 2. Comparison between FIX and RES. 

At first in Fig. 2 we compare the FIX and RES strategies for wavelength 
reservation. The graph clearly shows that the RES strategy performs better for both 
traffic classes. This was expected and is due to the better exploitation of the network 
resources (the wavelengths in this specific case). In the RES case the reserved 
wavelength pool is dynamically adjusted to the present state of traffic requests, with a 
sort of “call packing” approach. Because of the clear advantage of this reservation 
strategy, in the rest of the paper we will always assume that RES will be used. 

In Fig. 3 the packet loss probability is shown for different routing algorithms (SL 
and MA) and different traffic matrices (B and U) considering undifferentiated traffic. 

It is clear that MA performs better than SL even though the gain is not that big. 
This can be explained by considering that the network topology is only composed by 
a limited number of hops and not so many routing alternatives may be actually 
exploited. In [10] it is proved that dynamic algorithms perform much better than the 
static ones in presence of a bigger and more complex network. However, it is 
important to take into account that MA keeps packets within the network for longer 
and then the transmission delay becomes bigger than the SL case. This is why the 
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Fig. 3. Packet loss probability for SL and MA algorithms for undifferentiated traffic and for 
both cases of balanced and unbalanced traffic matrix. 
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Fig. 4. Packet loss probability for high and low priority classes with balanced traffic matrix, 
varying the number of reserved wavelengths (a) or the percentage of PIP traffic (b). 



routing of HP packets always adopts SL. Therefore the choice between SL and MA is 
relevant only to the routing of LP packets. Since results for the balanced and 
unbalanced traffic matrix are very similar, only the balanced case is shown in the 
following due to the limited space available. 

In order to understand how different choices affect the network behavior. Fig. 4 
shows the results obtained by different approaches. First we evaluate the performance 
assuming a fixed percentage of the HP class set to 20% while the number of 
wavelengths reserved to HP class varies from 1 to 4 out of 16 on each link (Fig. 4a). 
Then, the percentage of HP input traffic varies between 10% and 50% while the 
number of wavelengths reserved to HP class is fixed to 3 for each link (Fig. 4b). 
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As expected, the higher the number of dedicated wavelengths, the higher the gain 
in terms of loss probability that can be obtained for the HP class. When 1 to 4 
wavelengths are reserved, the loss probability improves by three orders of magnitude, 
while the performance of the LP class is barely affected. Packet loss probability 
remains nearly constant at one order of magnitude worse than the undifferentiated 
case. HP class reaches very low packet loss probability ( 10" 7 ) when resource 
reservation is equal to 25% (i.e. 4 wavelengths) of the whole set. Moreover, it is also 
not affected by the fact that LP class can be routed in different ways. 

On the other hand, for a very low percentage of HP traffic, good level of 
performance may be achieved. When HP traffic grows over the 10% performance 
starts getting worse quite rapidly, while the LP class again seems to be slightly 
affected. It follows that in case a given loss probability is required by HP traffic, the 
admission to the network has to be kept under control in order to avoid performance 
degradation due to the limited resources reserved to HP class. 

A good degree of differentiation between the two classes may be obtained in both 
cases reaching up to four orders of magnitude, while the adaptive routing strategy for 
LP traffic allows a further performance improvement. 

The results presented in Fig. 5a let us understand how the amount of reserved 
resources and the percentage of HP traffic are related when a given value of the HP 
class packet loss probability (PLP) is required. Only the SL algorithm is considered 
here. As expected, in case a given packet loss probability has to be guaranteed for an 
increasing percentage of HP traffic, more resources must be reserved. Furthermore, 
Fig. 5b shows the corresponding performance of the LP class. 
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Fig. 5. Relation between the HP traffic percentage and the percentage of resources reserved for 
given packet loss probability (a) and corresponding LP traffic performance (b). 



4 Network Design 

In this section a network design procedure is presented. The reference network is the 
same as above but the aim now is to calculate, with relation to the SL routing 
algorithm and to the traffic matrix adopted, the number of wavelengths required per 
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Fig. 6. Packet loss probability for high and 
low priority classes resulting from the network 
design procedure as a function of the 
percentage of resources dedicated to the F1P 
class with 20% of PIP class traffic. 




Network links 



Fig. 7. Percentage of additional wavelengths 
required by each link for different loss 
constraints. 



Table 1. Number of wavelengths required to achieve different packet loss probabilities. 
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fiber so that a given average load per wavelength is obtained. The main assumption is 
that all nodes generate the same input traffic which is uniformly distributed to the 
other nodes. 

The input traffic value is chosen so that the total number of wavelengths is very 
close to 16x12 = 192 as in the previous case, each wavelength being loaded at 80%. 
This allows a better comparison with almost the same cost in terms of wavelengths. 
The resulting resource distribution varies from 7 to 28 wavelengths per link. In the 
design procedure it is important to adopt the MA approach, otherwise performance 
decreases. This is due to the fact that SL, not sharing the wavelength resources, does 
not achieve load balancing. With the MA approach performance is the same as the 
balanced case with the advantage that the traffic matrix is now imposed by user needs 
instead of network configuration as before. 

In Fig. 6 the performance of SL and MA is shown for both classes varying the 
percentage of wavelengths reserved to HP traffic (set to 20%). Obviously, when the 
percentage of reserved wavelengths is low there is a bad service differentiation 
between the two classes. Moreover the trend of the curve for HP class is not as 
smooth as before. This is because losses are not uniformly distributed over the links, 
varying in a range between 10" and 10" 6 . The reason why this happens is because 
different numbers of wavelength are available on different links due to the 
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dimensioning procedure, providing different levels of wavelength multiplexing. In 
fact, links with less wavelengths experience worse performance. Thus the overall HP 
packet loss probability curve starts improving when more resources are added to these 
specific links. LP class seems to be less affected and its loss probability remains of 
the same order of magnitude with MA performing better than SL as usual. 

To improve the network design, a maximum acceptable packet loss probability per 
link may be fixed. Then simulation is iterated by adding wavelengths to those links 
that show higher losses until the loss constraint is satisfied. The drawback of this 
methodology is that the simulation time increases. The average wavelength load at the 
beginning is set to 80% but of course, when more resources are added, some links 
result less loaded. Moreover, at the end of the process links still do not have the same 
blocking probability, but at least all of them satisfy the loss requirement. The chart 
depicted in Fig. 7 shows the number of additional wavelengths (as a percentage of the 
starting number) that must be added to each link in order to meet different packet loss 
requirements. 

In Table 1 the number of iterations and the corresponding number of wavelengths 
for each link required to achieve different loss constraints are shown. 



5 Conclusions 

In this paper the problem of quality of service differentiation in DWDM packet- 
switched optical networks has been addressed. The effects of quality of service 
routing have been shown by applying dynamic wavelength management on each link 
jointly with static or dynamic routing strategies. Different quality of service 
algorithms have been analyzed and then applied to a network dimensioning 
procedure. The sharing effect produced by the dynamic routing algorithm proved to 
be particularly effective in this situation. An iterating procedure has then been applied 
to achieve loss balancing over network links with relation to design constraints. The 
main conclusion is that the use of assigned wavelength results optimized in relation to 
the performance target. Both delay and out-of-order problems as well as the 
application of the algorithms to more complex network topologies are currently under 
investigation and will be the subject of future works. 
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Abstract. This paper proposes a novel and incremental approach to Inter- 
Domain QoS Routing. Our approach is to provide a completely distributed 
Overlay Architecture and a routing layer for dynamic QoS provisioning, and to 
use QoS extensions and Traffic Engineering capabilities of the underlying BGP 
layer for static QoS provisioning. Our focus is mainly on influencing how traf- 
fic is exchanged among non-directly connected multi-homed Autonomous Sys- 
tems based on specific QoS parameters. We provide evidence supporting the 
feasibility of our approach by means of simulation. 

Keywords: Inter-domain QoS Routing, Overlay, BGP. 



1 Introduction 

At present, nearly 80% of the more than 15000 Autonomous Systems (ASs) that com- 
pose the Internet are stub ASs [1], where the majority of this fraction is multi-homed. 
For these ASs the issue of Quality of Service Routing (QoSR) at the inter-domain 
level arises as a strong need [2], Whereas some research groups rely on QoS and 
Traffic Engineering (TE) extensions to BGP [3-4], others tend to avoid new enhance- 
ments to the protocol and propose Overlay networks to address the subject [5-6]. 
While the former approach provides significant improvements for internets under low 
routing dynamics, the latter results more effective when routing changes occur more 
frequently. The main idea behind the overlay concept is to decouple part of the policy 
control portion of the routing process from BGP devices. In this sense, the two ap- 
proaches differ in how policies are controlled and signaled. BGP enhancements tend 
to provide in-band signaling, while the overlay approach provides out-of-band signal- 
ing. 
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E-NEXT under contract FP6-506869. 
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The Overlay Architecture is mostly appropriate when communicating domains are 
multi-homed, and thus may need some kind of mechanism to rapidly change their 
traffic behavior depending on network conditions. In fact, multi-homing is the trend 
that most stub ASs exhibit in nowadays Internet, which mainly try to achieve load 
balance and fault tolerance on the connection to the network [5]. In addition, present 
inter-domain traffic characteristics reveal that even though an AS will exchange traf- 
fic with most of the Internet, only a small number of ASs is responsible for a large 
fraction of the existing traffic. Moreover, this traffic is mainly exchanged among ASs 
that are not directly connected; instead they are generally 2, 3 and 4 hops away [4]. 

Therefore, the combination of all these features made us focus on QoSR among 
strategically selected non-peering multi-homed ASs. The approach to inter-domain 
QoSR we propose in this paper is to supply a completely distributed Overlay Archi- 
tecture and a routing layer for dynamic QoS provisioning, while we use QoS exten- 
sions and TE capabilities of the underlying BGP layer for static QoS provisioning. 
Within the overlay inter-domain routing structure reside special Overlay Entities 
(OEs), whose main functionalities are the exchange of Service Level Agreements 
(SLAs), end-to-end monitoring, and examination of compliance with the SLAs. These 
functionalities allow the OEs to influence the behavior of the underlying BGP routing 
layer, to take rapid and accurate decisions to bypass network problems such as link 
failures, or service degradation for a given Class of Service (CoS). The reactive nature 
of this overlay structure acts as a complementary layer conceived to enhance the per- 
formance of the underlying BGP layer containing both QoS aware BGP (QBGP) 
routers and non-QoS aware routers. 

The remaining of this paper is organized as follows. Section II presents an over- 
view of our overlay approach. In Section III the main functionalities required from the 
underlying BGP and overlay layers are analyzed, while Section IV presents our simu- 
lation scenario and results. Finally, Section V concludes the paper. 



2 Overview of the Proposed Overlay Approach 

As stated in the Introduction, we propose in this paper a combined QBGP and Over- 
lay Architecture for inter-domain QoSR. The main ideas behind the Overlay Architec- 
ture are: 

- The OEs should respond nearly two orders of magnitude faster than the BGP layer 
in the case of a network failure. 

- The OEs should react and try to reroute traffic when non-compliant conditions 
concerning QoS parameters previously negotiated for a given CoS are detected. 

- The underlying BGP structure does not need modifications, and remains unaware 
of the QoSR architecture running on top of it. 

The next figure depicts a possible scenario were our proposal could be applied. In this 
model, two peering OEs belonging to different ASs spanning across several AS hops 
are able to exchange a SLA and agree upon a set of QoS parameters concerning the 
traffic among them. The intermediate ASs do not need to participate in the Overlay 
Architecture, and therefore no OEs are needed within these transit ASs. From our per- 
spective, the real challenge is to develop a completely distributed overlay system, 
where each OE behaves in a reflective manner. In this sense our overlay approach is 
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Fig. 1 . Inter-Domain QoSR scenario where OEs are used for dynamic QoS provisioning among 
remote multi-homed ASs 

like facing a mirror. Instead of proposing a complex scheme to dynamically and accu- 
rately manage how traffic enters a target AS, we focus on how traffic should exit from 
the source AS. Hence, what we seek is that the OE within the source AS behaves like 
the image in a mirror of the OE in the target AS. This mirroring scheme allows the 
OE in the source AS to dynamically manage its outgoing traffic to the target AS, de- 
pending on the compliance with the previously established SLA for a given set of 
CoSs. Then, within each AS, the OE should measure end-to-end QoS parameters 
along every link connecting the multi-homed AS to the Internet, and check for viola- 
tions to the SLAs. Henceforth, we assume that the topology has at least two different 
end-to-end paths between any pair of remote ASs participating in our QoSR model. 
When a violation is detected, the OE in the source AS is capable of reconfiguring on- 
the-fly its traffic pattern to the remote AS for the affected CoS. Here, the time scale 
needed to detect and react to a certain problem is very small when compared with the 
BGP time scale [7], 

The end-to-end measurements are based on active AS path probing among peering 
OEs. Hence, each OE within an AS spawns probes targeting the remote AS through 
every available link connecting the source AS to the Internet. We sustain, and we will 
show by simulation that the AS-AS probing practice is not demanding neither in 
terms of traffic nor in terms of processing, as long as the number of overlay peering 
ASs and the number of CoSs remains limited. In fact, the traffic generated between 
two OEs is negligible. It is worth noting that a non-complying condition may only oc- 
cur in a single direction of the traffic, which means that the bottleneck is merely on 
the upstream or the downstream path. For example, in Fig. 1 the OEs in AS and AS 2 
measure the same parameters, such as One-Way Delay (OWD) [8] or One-Way Loss 
(OWL) [9], and react in the same manner due to their mirrored behavior. Therefore, 
either of them is able to independently decide if it should shift its outbound traffic or 
not. An advantage of this approach is that BGP updates could be completely avoided 
if, for example, the LOCAL PREFERENCE (LOCAL_PREF) is used when reallocat- 
ing this traffic. 

Agarwal et al. proposed an interesting overlay mechanism to reduce the fail-over 
time and to achieve load balancing of traffic entering an AS [5]. However, this pro- 
posal does not reuse any QoS or TE capabilities from the BGP layer. Moreover, it in- 
troduces a centralized and complex server which allows an AS to infer, by means of 
heuristics, the topology and customer/peer relationships among the multiple ASs that 
conform all tentative paths known to any given peering AS in the overlay structure. 
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The complexity introduced is mainly due to the fact that accurately controlling how 
traffic enters an AS is a very intricate task, particularly when this must be done dy- 
namically. As an alternative, our proposal deals with the allocation of traffic from the 
source AS, since we strongly believe that simpler approaches such as this one will 
turn out more attractive to become deployed. 



3 Main Functionalities of the Routing Layers 

In this Section we describe in detail the overlay routing functionalities as well as the 
underlay BGP routing functionalities. 



3.1 Top Layer: Overlay Routing Functionalities 

This layer is composed by a set of OEs: 

3.1.1 Basic Set of Components 

- At least one OE exists per QoS domain. 

- An OE has full access to the border BGP routers within an AS. 

- An OE has algorithms for both detecting non-conforming conditions for a given 
CoS, and deciding when and how to reallocate its traffic. 

3.1.2 Main Components 

An Overlay Protocol: A protocol between remote OEs is needed. This protocol al- 
lows OEs to exchange SLAs with each other, and to exchange substantial information 
for the Overlay Architecture. 

Metric Selection: In order to validate our approach, we choose a simple QoS 
parameter for the dynamical portion of our QoSR model. The parameter we have 
selected is a smoothed OWD (SOWD), which defines the following metric: 

1 k=n+N - 1 

OWD (m,n) = — ^OWD(m,k) (1) 

N k=n 

in which m and n correspond to the n th probe generated by a source OE and sent to- 
wards the m th external link of the AS. This SOWD corresponds to the average OWD 
through a sliding window of size N. Instead of using instantaneous values of the 
OWD, we propose to use this low-pass filter, which smoothes the OWD avoiding 
rapid changes in our metric. A trade-off exists in terms of the size of the window. A 
large value of N implies a slow reaction when network conditions change and maybe 
the reallocation of traffic is needed. On the other hand, small values of N could trans- 
late into frequent traffic reallocations since it is likely to occur that non-compliant 
conditions are more frequently met. In this scenario the SLA exchanged by the OEs is 
simply the maximum SOWD D. tolerated for each different CoS C . 

We assume that an OE uses one logical address for each different CoS, and also 
that specific local policies are applied to Internal BGP (IBGP). Thus, a single OE 
could be configured to probe a remote OE for any given CoS, through all available 
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egress links of the AS in a round-robin fashion avoiding the hot potato routing prob- 
lem. Then, the OEs compute a per-CoS cost to reach the remote AS over every exter- 
nal link m based on the previous metric. Furthermore, packets probing a specific CoS 
belong to that CoS. For instance, in a QBGP framework based on Differentiated Ser- 
vices (DiffServ), when probing a particular CoS which is mapped to an Assured For- 
warding (AF) class in each intermediate AS, the probes are tagged under the same AF 
class [10]. We assume that the OEs are properly synchronized (e.g., by means of 
GPS) and the details concerning synchronization are out of the scope of this work. 

Piggy-Backing mechanism: An important issue is that an active probing technique 
developed to measure the OWD requires feedback from the remote OE. However, the 
mirroring scheme implies that the remote OE is already probing the local OE and ex- 
pects feedback from this latter as well. Thus, the easiest way to avoid unnecessary 
messages traversing the network is to endow the protocol between the OEs with a 
piggy-backing technique. Then, feedback for the OWD is carried on the probes itself. 

Stability: Another central issue is that the traffic reallocation process should never 
generate network instability. In order to prevent this from happening, but keeping in 
mind that we follow a completely distributed architecture design where the OEs 
should rely on themselves to cope with these problems, we impose the following re- 
striction: 

“Traffic targeting a certain CoS C j should never be reallocated over 
a link s, if and only if the primary link to reach C was .v in [t-T h ,t\ 
or C j has exceeded its maximum number of possible reallocations 
=> R/t) • R ; max ” 

In this way the parameter T h avoids short-term bounces, while the parameter R '' r ' x 
avoids the long-term ones. Then, each time a traffic reallocation process takes place 
for a given CoS C, the variable R/t) is incremented. Our approach is to provide a sort 
of soft penalization similar to BGP damping [11], where the penalty is incremented 
by a fixed value P with each new allocation, but it decays exponentially with time 
when no reallocations occur according to: 

*/«) = *;(!> 1 ' J (2) 
where T h , R" , P and are configurable parameters, whose values depend on the de- 
grees of freedom in the number of short and long-term reallocations we allow for a 
given CoS C. An additional challenge in terms of stability arises when a path be- 
comes heavily loaded, since several CoSs within the path could experience non- 
compliant conditions with their respective SLAs. In order to prevent simultaneous re- 
allocations for all the affected CoSs, we endow the OEs with a contention mechanism 
which prioritizes the relevance of the different CoSs. Then, more relevant CoSs are 
reallocated faster than less priority classes. The contention algorithm operates as fol- 
lows: 

J Let Cj be one of the q affected CoS within link m, where j = 1 ,..,q 

1 Cj will be reallocated in Tj, where Tj e [K j^\,K ■ ) and Tj is randomly selected , with K 0 =0 

=> Then, the highest priority classes C] within link m will be reallocated in a random time 
7] e [ O.A'j ), classes C 2 will be reallocated in a random time T-, e [K l ,K 2 ), and so on. 
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Clearly, our contention mechanism allows an OE to iteratively reallocate traffic 
from a loaded path, and to dynamically check if the remaining classes continue under 
non-compliant conditions. It is likely that as soon as we begin to extract traffic from 
the path, the remaining classes will start to experience better end-to-end performance. 
However, a different situation is generated when a link failure occurs. In this case, an 
OE should react as fast as possible to reallocate all traffic from the affected path. 
Then, a trade-off exists in terms of both the contention mechanism and the ability to 
rapidly redistribute all traffic from any given link. Instead of tuning the contention al- 
gorithm to efficiently cope with both problems at the same time, we rely on the prob- 
ing technique since a link failure will cause the complete loss of probes for all the 
CoSs within the link. Our proposal is based on incrementing the frequency of the 
probes per-CoS as soon as losses are detected. We maintain that this rising in the fre- 
quency does not exacerbate the load on the network, firstly because the fraction of 
traffic generated by the OE that detects the problem is negligible in terms of the over- 
all traffic exchanged between both ASs. Secondly, this is done for a short period of 
time and only with the aim of speeding up the re-routing process. Once a CoS is real- 
located, the frequency of the probes decreases back to its normal value. 



3.2 Bottom Layer: Underlay BGP Routing Functionalities 

The set of routes to be tested by the OEs using the probing techniques described in the 
previous sub-section, are predetermined by the underlying BGP-based layer. In this 
layer two types of devices can operate; legacy BGP routers and QBGP routers. A 
QBGP router is able to distribute QoS information and take routing decisions per-CoS 
constrained to the previously established SLA between different peering domains. In 
our model, QBGP routers can be seen as the practical tool to establish the overall in- 
ter-domain QoSR infrastructure composed by several sub-routing layers, one for each 
CoS, which in addition could be dynamically influenced by the overlay layer. Inter- 
esting approaches and further information on the subject of QBGP could be found in 
[3, 12]. 



3.3 Combined QoSR Algorithm 

The next scheme (Fig. 2) depicts our combined QoSR algorithm. Let m be the external 
link currently allocating traffic of class C . It is important to remark that the approach 
we follow is that even though an alternative path could have a better cost in terms of 
SOWD, we avoid reallocating traffic of class C. from link m until a violation to the 
SLA is detected. Then, two distinct threads of events occur upon the reception of a 
probe for class C. Initially, the probe (k,l) is separated from the piggy-backed feed- 
back OWD(m,n). In order to accurately reply back to the sender, the first to be proc- 
essed is the OWD(k.l) which is shown as (I) in Fig. 2. On the other hand, the piggy- 
backed OWD(m,n) is processed, which is depicted as (II) in the figure. 

Once the SOWD is computed, the algorithm checks for violations to the maximum 
SOWD tolerable, that is D r If no violations have occurred the algorithm simply waits 
for the next incoming probe. However, if a violation is detected in link m the algo- 
rithm checks if the maximum number of allowed reallocations R™' Y is exceeded. In 
case this is true, the local OE is able to compose a feedback message and warn the 




A Proposal for Inter-domain QoS Routing 263 



remote OE about this situation. The main idea is that the feedback process provides 
information to the remote OE, and thus it can try to handle the problem by tuning its 
static QoS provisioning using either QBGP or TE-BGP. 

If R" ax is not exceeded, then the OE needs to check, within all the external avail- 
able links p, excepting m, if there exists at least one link i whose cost M. satisfies the 
constraint for the class C,. Moreover, it also needs to check if the link has enough 
room to handle the class reallocation. Subsequently, and in order to avoid any short 
term bounce, the OE excludes from the set of capable links those who had allocated 
traffic of C in [t-T h ,t], Once this is done, we rely on QBGP to tiebreak in case two or 
more links show the same cost in terms of the SOWD. At this step a single link is left 
as the target for the reallocation of the class. Then, the contention algorithm is exe- 
cuted and T seconds later the OE checks if the class still remains in a violating condi- 
tion. If this is the case, the OE increments R.( t) by P and reroutes the traffic of C . 




Fig. 2. Combined QoSR Algorithm 



4 Simulation Results 

The Overlay Architecture proposed in this paper is being evaluated and validated by 
simulation. In this section some preliminary results are presented to allow a first 
evaluation of the overall architecture and its capability to support QoS traffic classes 
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in a dynamic way. We are using the J-Sim simulator [13] with the BGP Infonet suite 
[14] which is compliant with BGP specification RFC 1771 [15]. A set of Java com- 
ponents with the functionalities of the overlay layer was developed. In order to allow 
the Overlay Entities to have full access to the Adj-RIBs-In and the Loc-RIB of a BGP 
speaker [15], and to have control over the BGP decision process, it was necessary to 
add some extensions to the Infonet suite. Furthermore, we have also included the fol- 
lowing QoS BGP extensions: 

- An optional transitive attribute to distribute the CoS identification (ID), and a set of 
modifications to BGP tables to allow the storage of this additional information, fol- 
lowing a similar approach to the one described in [3]. 

- A set of mechanisms to: i) allow BGP speakers to load the supported CoSs; 
ii) allow each local IP prefix to be announced within a given CoS; iii) allow BGP 
speakers to set the permissibility based on local QoS policies and supported capa- 
bilities. 

For our simulations, we used the topology presented in Fig. 7. The topology is 
based in the GEANT European Academic Backbone with some simplifications to re- 
duce the complexity of the simulation model. In this topology we considered as re- 
mote multi-homed AS domains AS] and AS r All links were assumed to be bi- 
directional with the same capacity C (C=2Mbps) and propagation delay P d (P (1 =10ms), 
with the exception of AS, links where, in order to have some bottleneck, the capacity 
chosen was C/2. For complexity concerns, we modeled each AS as a single QBGP 
router with core DiffServ capabilities configured to support four different IP packet 
treatments (EF, AF11, AF21 and Best-effort) allowing four different CoSs, namely 
CoSl, CoS2, CoS3 and CoS4. Thus, on the domain where traffic was injected we 
used edge DiffServ capabilities to mark packets with a specific DSCP (DiffServ Code 
Point) depending on its corresponding CoS. These marks were applied to regular IP 
packets and to the probes generated by the OE. The test conditions are summarized in 
Table 1. The results obtained are presented in Fig. 3 to Fig. 6. The maximum SOWD 
tolerated per-CoS (D.) was heuristically chosen to allow the OEs to take advantage of 
alternative paths. The SWOD computed when probes were lost was also heuristically 
chosen. The criterion selected was that 3 consecutive losses imply nearly a rise of 
25% in the SWOD. For the tests presented we set R '" ' = °° Vj. Moreover, no probes 
were generated for Best-effort traffic (NA=Not Available), and a sliding window of 3 
seconds was used in all tests, which is shown as Mov. Average in Table 1. 



Table 1 . Test conditions 



CoS 


CBR 

(Mbps) 


Pkt. Size 
(KB) 


PHB 


Max. SOWD 
(ms) 


E9H 


Hold (Contention) 
&T h (s) 


Mov. Aver- 
age 


pm 


0.4 


1 


EF 


85 


1 s, 1KB 


3 & 8 


3s 




0.8 


1 


AF1 1 


too 


1 s, 1KB 


6 & 12 


3s 




msm 


1 


AF21 


120 


1 s, 1KB 


9 & 20 


3s 




IB 


N.A 


BE 


N.A 


N.A 


N.A 


N.A 



The first objective of the simulation was the validation of the initial assumption 
that our approach, based on a complementary routing layer, enhances the reaction of 
the overall routing infrastructure. Then, as a performance indicator, we chose to com- 
pare the response time to a link failure. Fig. 3 depicts a set of plots for traffic of CoS 1 
showing the throughput measured at the destination, the SOWD experienced by 
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Fig. 3. Link failure reaction with and without OE 



probes for all available paths, and the path shifts determined by changes in the next- 
hop for the source AS, namely AS,. From these plots, we can observe that a pure 
QBGP framework (without OEs running on AS ; and AS,) needs about 80 seconds to 
overcome a link failure, but only 5 seconds are needed when OEs are running. This 
result validates our initial assumption. It is worth mentioning that this last value in- 
cludes not only the implicit link failure detection condition based on a violation to the 
maximum SOWD tolerated, but also includes a random contention interval of 3 sec- 
onds before re-routing. 

Secondly, from figures 4 and 5, we can observe that without OEs there are clear 
violations to the SLAs established between the end-to-end domains. However with 
OEs, it becomes clear that the architecture is able to react to SLA violations, and find 
the best paths to reallocate traffic for the affected traffic classes. Consequently, after a 
transitory interval of approximately 13 seconds, needed to accommodate the traffic 
for each CoS, it is visible that a steady state is reached and the SLAs are satisfied for 
all affected classes. Furthermore, and in order to evaluate overall link utilization, we 
measured the throughput over all available links at the destination AS (AS 2 ). Fig. 6 
shows that with OEs, in addition to the compliance with the SLAs a better distribution 





Fig. 4. Throughput for traffic of CoSl-CoS4, with and without OE 
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Fig. 5. OWD in all available paths for CoSl-CoS4, with and without OE 
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Fig. 6. Remote AS link utilization 



Fig. 7. Topology based on the GEANT Net- 
work [16] 



of inter-domain traffic is obtained, and thus, resources are more efficiently used. The 
extra cost in these cases was merely an increment of 8 Kbps, per-CoS, on each link in 
the remote AS-AS traffic, when oversized probes of 1 KB were spawned. 



5 Conclusions 

This paper depicts the framework for a combined inter-domain QoSR paradigm based 
on a completely distributed Overlay Architecture coupled with a QBGP or TE-BGP 
routing layer. As a first step in our research, and in order to validate our approach we 
have focused on the coupling of the overlay with a DiffServ QBGP underlying layer. 
The results obtained show that our distributed Overlay Architecture substantially en- 
hances end-to-end QoS when compared with a pure QBGP model. We believe that 
whereas significant extensions and enhancements to BGP are certainly going to be 
seen, the overlay structure arises as a strong candidate to provide flexible and value- 
added out-of-band inter-domain QoSR. In particular, this becomes perfectly suitable 
when inter-domain traffic patterns need to dynamically adapt and rapidly react to me- 
dium or high network changing conditions, where the former solutions seem imprac- 
ticable at the present time. 
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Policy-Aware Connectionless Routing* 
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Abstract. The current Internet implements hop-by-hop packet forwarding based 
entirely on globally-unique identibers specified in packet headers, and routing 
tables that identify destinations with globally unique identifiers and specify the 
next hops to such destinations. This model is very robust; however, it supports 
only a single forwarding class per destination. As a result, the Internet must rely 
on mechanisms working “on top” of IP to support quality-of-service (QoS) or 
traffic engineering (TE). We present the first policy-based connectionless routing 
architecture and algorithms to support QoS and TE as part of the basic network- 
level service of the Internet. We show that policy-aware connectionless routing can 
be accomplished with roughly the same computational efficiency of the traditional 
single-path shortest-path routing approach. 



1 Introduction 

The current Internet architecture is built around the notion that the network layer provides 
a single-class best-effort service. This service is provided with routing protocols that 
adapt to changes in the Internet topology, and a packet forwarding method based on a 
single class of service for all destinations. Using one or more routing protocols, each 
router maintains a routing-table entry for each destination specifying the globally unique 
identifier for the destination (i.e., an IP address range) and the next hop along the one 
path chosen for the destination. Based on such routing tables, each router forwards data 
packets independently of other routers and based solely on the next-hop entries specified 
in its routing table. This routing model is very robust. However, there are many examples 
of network performance requirements and resource usage policies in the Internet that 
are not homogeneous, which requires supporting multiple service classes [2], 

Policy-based routing involves the forwarding of traffic over paths that honor policies 
defining performance and resource-utilization requirements. Quality-of-service (QoS) 
routing is the special case of policy-based routing in the context of performance policies, 
and traffic engineering (TE) is routing in the context of resource-utilization policies. 
Section 2 reviews previous solutions for supporting QoS and TE. All past and current 
approaches to supporting QoS and TE in the Internet have been implemented “on top” 
of the single-class routing tables of the basic Internet routing model. This leads to 
inefficient allocation of the available bandwidth, given that paths computed based on 
shortest-path routing within autonomous systems have little to do with QoS and TE 

* This work was supported in part by the Defense Advanced Research Projects Agency (DARPA) 
under Grant N6600 1-00-8942 and by the Baskin Chair of Computer Engineering at UCSC. 
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constraints. Furthermore, current proposals for policy-based routing [1] are connection- 
oriented, and require source-specified forwarding implemented by source routing or 
some form of path setup. 

There are two key reasons why policy-based path selection (with QoS and TE con- 
straints) has not been addressed as an integral part of the basic routing model of the 
Internet. Routing with multiple constraints is known to be NP hard [8], and the basic 
Internet packet-forwarding scheme is based on globally-unique destination identifiers. 
This paper introduces the first policy-aware connectionless routing model for the In- 
ternet addressing these two limitations. It consists of the routing architecture presented 
in Section 3, and the path-selection algorithms presented in Section 4. The proposed 
policy-aware connectionless routing (PACR) architecture is the first to extend the notion 
of label swapping and threaded indices [4] into connectionless packet forwarding with 
multiple service classes. The path-selection algorithms we introduce generalize Dijk- 
stra’s shortest path first (SPF) algorithm to account for both TE and QoS constraints. 
These algorithms have been shown to be correct (i.e., they compute loop-less paths sat- 
isfying TE and QoS constraints within a finite time) [12], and are the first of their kind 
to attain computational efficiencies close to that of SPF for typical Internet topologies. 



2 Previous Work 



Resource Management: Two Internet QoS architectures have been developed for re- 
source management: The integrated services (intserv) architecture [2], and the differ- 
entiated sendees (diffserv) architecture. Both work “on top” of an underlying packet 
forwarding scheme. In Intserv, network resources must be explicitly controlled; appli- 
cations reserve the network resources required to implement their functionality; and 
admission control, traffic classification, and traffic scheduling mechanisms implement 
the reservations. Diffserv provides resource management without the use of explicit 
reservations. A set of per-hop forwarding behaviors (PFIBs) is defined within a diffserv 
domain to provide resource management services appropriate to a class of application 
resource requirements. Traffic classifiers are deployed at the edge of a diffserv domain, 
which classify traffic for one of these PHBs. Inside a diffserv domain, routing is per- 
formed using the traditional hop-by-hop, single-class mechanisms. 

Resource management for TE is quite simple. The desired resource utilization poli- 
cies are used as constraints to the path-selection function, and traffic classification and 
policy-based forwarding mechanisms are used to implement the computed paths. Cur- 
rent proposals [1] define resource-utilization policies by assigning network resources to 
resource classes, and then specifying what resource classes can be used for forwarding 
each traffic class. 

Routing Architectures: Currently proposed policy-based routing architectures are 
based on a centralized routing model where routes are computed on-demand (e.g. on 
receipt of the first packet in a flow, or on request by a network administrator), and forward- 
ing is source-specified through the use of source routing or path setup [6] techniques. 
These solutions are less robust, efficient, and responsive than the original distributed 
routing method. The forwarding paths in on-demand routing are brittle, because the 
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ingress router controls remote forwarding state in routers along paths it has set up, and 
must re-establish the paths in the event that established paths are broken. 

Path Selection: The seminal work on the problem of computing paths in the context 
of more than one additive metric was done by Jaffe [8], who defined the multiply- 
constrained path problem (MCP) as the computation of paths in the context of two 
additive metrics. He presented an enhanced distributed Bellman-Ford ( BF) algorithm that 
solved this problem with time complexity of 0(n 4 blog(nb)), where n is the number 
of nodes in a graph, and b is the largest possible metric value. Many solutions have 
been proposed for computing exact paths in the context of multiple metrics for special 
situations since Jaffe’s work. Wang and Crowcroft [14] were the first to present the 
solution to computing paths in the context of a concave (i.e. “minmax”) and an additive 
metric. Ma and Steenkiste [9] presented a modified BF algorithm that computes paths 
satisfying delay, delay-jitter, and buffer space constraints in the context of weighted-fair- 
queuing scheduling algorithms in polynomial time. Cavendish and Gerla [3] presented 
a modified BF algorithm with complexity of 0(n 3 ) which computes multi-constrained 
paths if all metrics of paths in an internet are either non-decreasing or non-increasing 
as a function of the hop count. Recent work by Siachalou and Georgiadis [11] on MCP 
has resulted in an algorithm with complexity 0(nW log(n)) . This algorithm is a special 
case of the policy-based path-selection presented in Section 4 of this paper. 

Several other algorithms have been proposed for computing approximate solutions 
to the QoS path-selection problem. Both Jaffe [8] and Chen and Nahrstedt [5] propose 
algorithms which map a subset of the metrics comprising a link weight to a reduced 
range, and show that using such solutions, the cost of a policy-based path computation 
can be controlled at the expense of the accuracy of the selected paths. Similarly, a number 
of researchers [8,10] have presented algorithms that compute paths based on a function 
of the multiple metrics comprising a link weight. These approximation solutions do not 
work with administrative traffic constraints. 



3 Policy-Aware Connectionless Routing (PACR) 

Policy-based routing requires the ability to compute and forward traffic over multiple 
paths for a given destination. For TE, multiple paths may exist that satisfy disjoint 
network usage policies. For QoS, there may not exist a universally “best” route to a 
given node in a graph. For example, which of two paths is best when one has delay of 
5 ms and jitter of 4 ms, and the other has delay of 10ms and jitter of l?ns depends on 
which metric is more critical for a given application. For FTP traffic, where delay is 
important and jitter is not, the former would be more desirable. Conversely, for video 
streaming, where jitter is very important and delay is relatively un-important, the latter 
would be preferred. Such weights are said to be incomparable. In contrast, it is possible 
for one route to be clearly “better than” another in the context of multi-component link 
weights. For instance, a route with delay of 5 ms and jitter of 1 ms is clearly better than 
a route with delay of 10ms and jitter of 5 ms for all possible application requirements. 
Such weights are said to be comparable. 

The goal of routing in the context of multi-component link weights is to find the 
largest set of paths to each destination with weights that are mutually incomparable. 
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The weights in such a set are called the performance classes of a destination. Supporting 
policy-based connectionless routing requires three main functions: (a) computing and 
maintaining routes that satisfy QoS and TE constraints for each destination, (b) classi- 
fying traffic intended for a given destination on the basis of TE and QoS constraints, 
and (c) forwarding the classified traffic solely on the basis of the next hops specified 
in the routing tables of routers for a given destination and for a given traffic class. The 
rest of this section outlines our proposed solution, which we call PACR (policy-aware 
connectionless routing). 



3.1 Policy-Aware Route Computation 

The routing protocols used in PACR must be designed to carry the link metrics required to 
implement the desired QoS and TE policies. This requires the use of either a topology- 
broadcast (also called “link-state”) or link-vector routing protocol [7] that exchanges 
information describing the state of links. The implementation of these routing protocols 
consists of two main parts: (a) information exchange signaling, and (b) local path- 
selection. 

The signaling component of the protocols is straightforward, because it suffices to 
re-engineer the signaling of one of many existing routing protocols to accommodate 
QoS and TE parameters of links. As we discuss subsequently, routing-table information 
can be exchanged in such signaling, in addition to link-state information. The path- 
selection component of the protocols is the complex part of PACR, because the path- 
selection algorithm used must produce paths that satisfy QoS and TE constraints at 
roughly the same speed with which today’s shortest-path algorithms compute paths in a 
typical Internet topology. Section 4 presents the new path-selection algorithms required 
in PACR, which arguably constitute the main contribution of the new architecture. 



3.2 Packet Forwarding 

Solutions for packet classification already exist and can be applied to distributed policy- 
based routing. However, forwarding packets solely based on IP addresses would require 




Fig. 1 . Traffic flow in PACR 



s 





Fig. 2. Forwarding labels in PACR 
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each relay of a packet to classify the packet before forwarding according to the content 
of its routing table. We propose using label-swap forwarding technology to require only 
the first router that handles a packet to classify it before forwarding it. Accordingly, 
the forwarding state of a router must be enhanced to include local and next hop label 
information, in addition to the destination and next hop information existing in traditional 
forwarding tables. Traffic classifiers must then be placed at the edge of an internet, where 
"edge” is defined to be any point from which traffic can be injected into the internet. 
Figure 1 illustrates the resulting traffic flow requirements of a router in PACR. 

To date, label-swapping has been used in the context of connection-oriented (virtual 
circuit) packet forwarding architectures. A connection setup phase establishes the labels 
that routers should use to forward packets carrying such labels, and a label refers to 
an active source-destination connection [6]. Chandranmenon and Varghese [4] present 
threaded indices, in which neighboring routers share labels corresponding to indexes 
into their routing tables for routing-table entries for destinations, and such labels are 
included in packet headers to allow rapid forwarding-table lookups. 

The forwarding labels in PACR are similar to threaded indices. A label is assigned 
to each routing-table entry, and each routing-table entry corresponds to a policy-based 
route maintained for a given destination. Consequently, for each destination, a router 
exchanges one or multiple labels with its neighbors. Each label assigned to a destination 
corresponds to the set of service classes satisfied by the route identified by the label. For 
example, Figure 2 shows a small network with four nodes with the forwarding tables 
at each node, two administrative classes A and B, and the given forwarding state for 
reaching the other nodes. 

4 Policy-Based Path- Selection Algorithm 

We model a network as a weighted undirected graph G = ( N , E), where N and E are 
the node and edge sets, respectively. By convention, the size of these sets are given by 
n = \ N \ and m = \ E |. Elements of E are unordered pairs of distinct nodes in N. 
A(i) is the set of edges adjacent to i in the graph. Each link (i,j) £ E is assigned a 
weight, denoted by Uy. A path is a sequence of nodes < , ■ ■ ■ ,%d > such that 

(xi,Xi+ 1 ) £ E for every i = 1, 2, . . . , d — 1, and all nodes in the path are distinct. The 
weight of a path is given by u p = UJ xix i+1 - The nature of these weights, and the 

functions used to combine these link weights into path weights are specified for each 
algorithm. 

4.1 Administrative Policies 

We use a declarative model of administrative policies in which constraints on the traffic 
allowed in an internet are specified by expressions in a boolean traffic algebra. The traffic 
algebra is composed of the standard boolean operations on the set {0, 1}, where a set 
of p primitive propositions (variables) represent statements describing characteristics of 
network traffic or global state that are either true or false. The syntax for expressions in 
the algebra is specified by the BNF grammar: 

"= 0 | 1 | Vi ...v p \ (->ip) \ (<p A <p) \ (<p V <p) \ (<p -> tp) 
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The set of primitive propositions, indicated by v i in the grammar, can be defined in terms 
of network traffic characteristics or global state. Administrative policies are specified 
for an internet by assigning expressions in the algebra to links in the graph, called 
link predicates. These predicates define a set of forwarding classes, and constrain the 
topology that traffic for each forwarding class is authorized to traverse, as required by 
the administrative policies. 

A SAT {<p) primitive is required for expressions in the traffic algebra which is 
the SATISFIABILITY problem of traditional propositional logic. Satisfiability must 
be tested in two situations: to determine if traffic classes exist that are authorized to 
use an extension to a known route, and to determine if all traffic authorized for a new 
route is already satsified by known shorter routes. The first is true iff the conjunction 
of these expressions is satisfiable (i.e., SAT(si A £ij)). The second is true iff the new 
route’s traffic expression implies the disjunction of the traffic expressions for all known 
better routes (i.e., (e* -A V £i 2 V ..) is valid, which is denoted by (e* -A £f) in 
the algorithms). Determining if an expression is valid is equivalent to determining if the 
negation of the expression is unsatisfiable. Therefore, expressions of the form e-\ — > £2 
are equivalent to A S AT -A £ 2 )) (or ->SAT{£ 1 A -'£ 2 ))- Satisfiability has many 
restricted versions that are computable in polynomial time. We have implemented an 
efficient, restricted solution to the SAT problem by implementing the traffic algebra as 
a set algebra with the set operations of intersection, union, and complement on the set 
of all possible forwarding classes. 



4.2 Performance Characteristics 

Path weights are composed of multi-component metrics that capture all important per- 
formance measures of a link such as delay, delay variance (“jitter”), available bandwidth, 
etc. Our path-selection algorithm is based on an enhanced version of the path algebra 
defined by Sobrinho [13], which we enhance to support the computation of the best set 
of routes for each destination. Formally, the path algebra P = < VV , '1) , A , C . () , oc > 
is defined as a set of weights W, with a binary operator ®, and two order relations, A 
and C, defined on VV. There are two distinguished weights in W, 0 and bo, representing 
the least and absorptive elements of W, respectively. Operator ® is the original path 
composition operator, and relation A is the original total ordering from [13]. Operator 
® is used to compute path weights from link weights. The relation A is used by the 
routing algorithm to build the forwarding set, starting with the minimal element, and 
by the forwarding process to select the minimal element of the forwarding set whose 
parameters satisfy a given QoS request. 

We add a new relation on routes, C, to the algebra and use it to define classes of 
comparable routes to select maximal elements of these classes for inclusion in the set of 
forwarding entries for a given destination. Relation C is a partial ordering (reflexive, anti- 
symmetric, and transitive) with the additional property that (u> x C lo v ) => (w x A to y ). 
The relation C defines an ordering on routes in terms of the containment (subset) of 
the set of constraints satisfied by one route within the set satisfied by another, i.e., if 
oJi C ojj, then the set of constraints that route i can satisfy is a subset of those satisfiable 
by route j. 
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algorithm Policy-Based-Dijkstra 

begin 

1 Push(<s,s, 0,1>, P s ); 

2 for each {(s, j) G A(s)} 

3 Insert(<j, s, uj s j , s s j >, T); 

4 while (\T\ = 0) 

begin 

5 < i, pi, uJi, £i > 4— Min(T); 

6 DeleteMin(Bi); 

7 » (| Hi |=0) 

8 then DeleteMin(T ) 

9 els e IncreaseKey(Min(Bi), Ti); 

10 Stmp Si', ptr 4— Tail (Pi); 

11 while ((etmp / 0) A (ptr / 0)) 

12 etmp Strap A ->ptr.£; ptr 4— ptr. next; 

13 if (£trap / 0) 



then begin 

Push(<i,pi, cut, £i>, Pi); 

for each {(i,j) G A(i) \ SAT(£i A £ij)} 

begin 

UJj 4 — UJi ® UJij', Sj 4 — Sij ; 

if {Tj = 0) 

then Insert(<j,i,ujj,£j>, T) 
else if (ujj -< Tj .uj) 

then DecreaseKey(<j, i , ujj, £j > , T); 
Insert(<j,i,ujj , £j >, Bj); 

end 

end 

end 

end 



Fig. 3. General-Policy-Based Dijkstra. 



Table 1 . Notation. 



P n = Queue of permanent routes to node n. 
T = Heap of temporary routes. 

T n = Entry in T for node n. 

B n = Balanced tree of routes for node n. 



A route r m is a maximal element of a set R of routes in a graph if the only element 
r £ R where r m C r is r m itself. A set R rn of routes is a maximal subset of R if, for all 
r £ R either r ^ R m , or r £ R m and for all s £ R — {r}, ->(r C s). The maximum size 
of a maximal subset of routes is the smallest range of the components of the weights 
(for the two component weights considered here). 

4.3 Path Selection 

Path selection in PACR consists of computing the maximal set of routes to each des- 
tination in an internet for each traffic class (stated through link predicates and multi- 
component link weights) for which a path to the destination exists. 

The path-selection algorithm in PACR maintains a balanced tree (B,) for each node 
i in the graph to hold newly discovered, temporary labeled routes for node i. A heap T 
contains the lightest weight entry from each non-empty II, (for a maximum of n entries), 
and the heap entry for node i is denoted by T t . Lastly, a queue, /' , is maintained for each 
node which contains the set of permanently labeled routes discovered by the algorithm, 
in the order in which they are discovered (which will be in increasing weight). The 
general flow of the path-selection algorithm is to take the minimum entry from the 
heap T, compare it with existing routes in the appropriate Pi, if it is incomparable with 
existing routes in P,; it is pushed onto Pi, and add “relaxed” routes for its neighbors to 
the appropriate B x ’s. 

The correctness of the PACR path-selection algorithm is based on the maintenance of 
the following three invariants: for all routes I £ P and J £ B*,I < J, all routes to a 
given destination i in P are incomparable for some set of satisfying truth assignments, and 
the maximal subset of routes to a given destination j in P 3 U Bj represents the maximal 
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Table 2. Operations on data structures. 


Notation 


Description 


Queue j 


Pusher, Q ) 
Head(Q) 
Pop(Q) 
PopTail(Q) 


INSERT RECORD r AT TAIL OF QUEUE Q (O ( 1 )) 
RETURN RECORD AT HEAD OF QUEUE Q (0(1)) 
DELETE RECORD AT HEAD OF QUEUE Q (O ( 1 ) ) 
DELETE RECORD AT TAIL OF QUEUE Q (O ( 1)) 


d-Heap ] 


Inserter, H) 

I ncreaseK ey(r, f/j,) 
DecreaseK ey(r, r^) 
Min(H) 
DeleteMin(H) 

Delete(ru ) 


INSERT RECORD r IN HEAP H (0(log d (n))) 

REPLACE RECORD r h IN HEAP WITH RECORD r HAVING GREATER KEY VALUE (0(d log d (n))) 
REPLACE RECORD r h IN HEAP WITH RECORD r HAVING SMALLER KEY VALUE (0(log d (n))) 
RETURN RECORD IN HEAP H WITH SMALLEST KEY VALUE (0(1)) 

DELETE RECORD IN HEAP H WITH SMALLEST KEY VALUE (0(d log d (n) )) 

DELETE RECORD FROM HEAP (O (dlog d (n) )) 


Balanced Tree ] 


Inserter, B ) 
Min(B) 
DeleteMin(B) 


INSERT RECORD r IN TREE B (O ( log ( n ) ) ) 

RETURN RECORD IN TREE B WITH SMALLEST KEY VALUE (O (log ( n) )) 
DELETE RECORD IN TREE B WITH SMALLEST KEY VALUE (0(log(n))) 



subset of all paths to j using nodes with routes in P. Furthermore, these invariants are 
maintained by the following two constraints on actions performed in each iteration of 
these algorithms: (1) only known-non-maximal routes are deleted or discarded, and (2) 
only the smallest known-maximal route to a destination i is moved to P t . The details of 
this proof are presented elsewhere [12], 

The PACR path-selection algorithm, presented in Figure 3 computes an optimal 
set of routes to each destination subject to multiple general (additive or concave) path 
metrics, in the presence of traffic constraints on the links. The notation used in the 
algorithms presented in the following is summarized in Table 1. Table 2 defines the 
primitive operations for queues, heaps, and balanced trees used in the algorithms, and 
gives their time complexity used in the following analysis. The worst-cast time com- 
plexity of Policy-Based-Dijkstra is 0(nW 2 A 2 ) , where the maximum number of unique 
truth assignments is denoted by A = 2 P (p is the number of primitive propositions in 
the traffic algebra), and the maximum number of unique weights by W — min (range 
of weight components). The performance of special-case variants of this algorithm for 
traffic-engineering and QoS (called the "Basic” algorithms below) are 0(mA\og(A)) 
and 0(mW log(W)), respectively. Furthermore, for these variants, refinements in the 
data structures result in algorithms (called the “Enhanced” algorithms below) with 
0(mA\og(n)) and 0(mW log(n)) complexity. Details of these variants and the com- 
plexity analysis are presented elsewhere [12], 

4.4 Performance Results 

Figures 4 and 5 present performance results for the path-selection algorithm. The ex- 
periments were run on a 1GHz Intel Pentium 3 based system. The algorithms were 
implemented using the C++ Standard Template Library (STL) and the Boost Graph Li- 
brary. Each test involved running the algorithm on ten random weight assignments to 
ten randomly generated graphs (generated using the GT-ITM package [15]). 

Fig. 4 show the worst-case measurements for each test of the QoS algorithms. The 
“Traditional” algorithm is an implementation of Dijkstra’s shortest-path- first (SPF) al- 
gorithm using the same environment as that used for the other algorithms for use as a 
reference. The metrics were generated using the “Cost 2” scheme from [11], where the 
delay component is randomly selected in the range 1 ..MaxMetric, and the cost com- 
ponent is computed as cost = o(MaxMetric — delay), where er is a random integer 
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Runtime(graph size) - Maximum Metric = 1 000 



Normalized Runtime(Vert Cnt) - MaxMet = 1000, Deg = 8, # Traffic Classes = 32 





Graph size (# vertices) 



Vertex Count 



Fig. 4. QoS Runtime(Size) 



Fig. 5. TE Norm Runtime(Size) 



in the range 1..5; this scheme was chosen as it proved to result in the most challenging 
computations from a number of different schemes considered. 

Tests were run for performance (both runtime and space) as a function of graph 
size, average degree of the graph, and the maximum link metric value. Due to space 
constraints, only the graphs for runtime as a function of size are shown here with a 
maximum metric of 1000. These results show that, while costs increase with both graph 
size and average degree, the magnitude and rate of growth are surprisingly tame for what 
are fundamentally non-polynomial algorithms. 

Fig. 5 shows the performance of the “Basic” traffic engineering algorithm on a similar 
range of parameters. Each data point represents the worst performance of the algorithm 
out of 9 runs (3 randomly generated graphs with 3 random link weight assignments each). 
To control the number of forwarding classes in a graph, each graph was generated as two 
connected subgraphs. Bridge links were then added between 32 randomly selected pairs 
of vertices from each subgraph to form a single graph with at most 32 paths between any 
two nodes in different original subgraphs. 32 tests of the algorithm are then run with all 
traffic classes initially allocated to one bridge link (resulting in one forwarding class for 
all 32 traffic classes), and successive runs are performed with traffic classes distributed 
over one additional link for each test, with the final run allowing one traffic class over each 
bridge link (resulting in a one-to-one mapping of traffic classes to forwarding classes). 
In each test, the link predicate of all non-bridge links is set to allow all traffic classes 
(i.e. it is set to true). Each plot shows the results for 1, 8, 16, 24, and 32 forwarding 
classes in terms of the runtime of the algorithm normalized as a fraction of the “Brute 
Force” runtime required to run the traditional Dijkstra algorithm once for each traffic 
class. The plots show that the algorithm provides significant savings when the number 
of forwarding classes is small, and gracefully degrades as the number of forwarding 
classes grows. 



5 Conclusions 

We have defined policy-aware routing as the computation of paths, and the establishment 
of forwarding state to implement paths, in the context of non-homogeneous performance 
requirements and network usage policies. We showed that a fundamental requirement 
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of policy-aware routing is support for multiple paths to a given destination, and that 
the address-based, single-forwarding-class Internet routing model cannot support such a 
requirement. We presented PACR, which is the first policy-based connectionless routing 
architecture, and includes the first policy-based routing solution that provides integrated 
support of QoS and TE. The path-selection algorithms introduced for PACR constitute 
the most efficient algorithms for path selection with QoS and TE constraints known to 
date. Furthermore, their computational efficiency is comparable to that of shortest-path 
routing algorithms, which makes policy-aware connectionless routing in the Internet 
feasible. 
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Abstract. Max-min is an established fairness criterion for allocating bandwidth 
for flows. In this work we look at the combined problem of multi-path routing 
and bandwidth allocation such that the flow allocation for each connection will 
be maximized and fairness will be maintained. We use the weighted extension of 
the max-min criterion to allocate bandwidth in proportion to the flows' demand. 
Our contribution is an algorithm which, for the first time, solves the combined 
routing and bandwidth allocation problem for the case where flows are allowed to 
be splitted along several paths. We use multi commodity flow (MCF) formulation 
which is solved using linear programming (LP) techniques. These building blocks 
are used by our algorithm to derive the required optimal routing and allocation. 



1 Introduction 

Traffic engineering is a paradigm where network operators control the traffic and allocate 
resources in order to achieve goals such as, maximum flow or minimum delay. One 
challenge is to allow different aggregates of flows to share the network, so that the 
total flow will be maximized while fairness will be preserved. These flows are derived 
from customers service level agreements (SLAs) and are abstracted here as a list of rate 
allocation demands to be sent between specific source and destination nodes. Given a 
network topology and a set of demands, network operators may wish to maximize their 
profit by routing the traffic so it will maximize the total allocated bandwidth the network 
can carry, or namely the total assigned net flow. To do this we allow flows to be arbitrarily 
split inside the network. 

One way to maximize the network flow is to formulate it as a maximum multi- 
commodity flow (MCF) problem which can be solved using linear programming (LP). 
Each commodity net flow is assigned to a different client. While the solution will maxi- 
mize the flow, it will not always do it in a fair manner. Flows that traverse several loaded 
links will be allocated very little bandwidth or non at all, while flows that traverse short 
hop distances or meet fewer other flows on its way will receive a large allocation of 
bandwidth. 

In an attempt to introduce fairness to the maximum flow problem the concurrent 
multi-commodity flow problem was suggested. This MCF LP formulation requires that 
all demands will be equally satisfied and seeks a routing that maximizes network flow 
in equal portions per demand. However, the end solution under-utilizes the network, 
sometimes saturating only a small fraction of it. 



J. Sole-Pareta et al. (Eds.): QoflS 2004, LNCS 3266, pp. 278-287, 2004. 
(c) Springer- Verlag Berlin Heidelberg 2004 
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Fig- 1 . An example of flow assignment: Maximum Flow, Maximum Concurrent, max-min fair and 
the WMCM algorithm weighted max-min fair algorithms 



In this paper we suggest an algorithm, called WMCM (Weighted Max-min fair 
Concurrent MCF), that finds routing and resource allocation that maximizes the network 
link utilization, but, at the same time, adheres to the weighted max min fairness criterion 
[1], The WMCM algorithm is based on the maximum concurrent multi-commodity flow 
problem and bridges the gap between the two other solutions described above. 

To clarify the difference between the different algorithms consider the example in 
Figure 1, which depicts a network with five nodes connected by one unit capacity links. 
Consider the max-min fairness setting used in [2] where each path represents one flow. 
We have 6 flows. Two flows from node 1 to node 3: flow la using path (1-2-3) and flow 
16 using path (1-5-2-3), two flows from node 1 to node 2: one (2a) using link (1,2) and 
the other, 26, via node 5, one flow (flow 3) from node 2 to node 3 and one flow from node 
2 to node 4 (flow 4). The max-min fair [2] vector in this case is (1/4, l/4,3/4.3/4, 1/4, 1/4), 
where the bottleneck link is shared by four flows, each gets 1/4 unit. The used setting for 
the MCF flow allocation refers to flows per commodity instead of per path. This example 
shows 4 commodities: commodity 1 with paths la and lb, commodity 2 with paths 2a 
and 26, commodity 3 and 4 with one path each. The maximum MCF problem results 
in an allocation commodity rate vector (0,2, 1/2, 1/2) (path vector of (0,0, 1,1, 1/2, 1/2)) 
starving the two paths (la and lb) of ‘commodity 1’ flow to achieve the maximum 
possible flow of 3 units. The WMCM algorithm with equal weights for all commodities 
will results with the unique commodity rate vector ( l/3,5/3, 1/3, 1/3) for commodities 
1,2,3 and 4. Note that in this case there are more than one allocation of flows per path 
that achieves the max-min vector, e.g., (1/3, 0,2/3, 1,1/3, 1/3) or (1/6. l/6,5/6.5/6, 1/3, 1/3). 
In case commodity 1 is given a weight that is double than the rest of the nodes (demand 
vector (2, 1,1,1)), the concurrent MCF problem will allocate it double the bandwidth 
allocated for the flows in its bottleneck link (link(2,3)) and the weighted max-min vector 
is (1/2, 3/2, 1/4, 1/4). 

2 Related Work 

The Max-min fairness bandwidth allocation was mostly examined in the context of 
one fixed path per session, where a session is defined by a pair of terminals. A simple 
algorithm that finds the max-min fair allocation where routing is given appears in [2]. 
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Chen and Nahrstedt [3] provide max-min fair allocation routing. They present an 
unweighted heuristic algorithm that selects the best path so the fairness-throughput is 
maximized upon an addition of a new flow. Their algorithm assumes the knowledge of 
the possible paths for each new flow. 

Many other distributed algorithms deal with dynamic adjustments of flow rates to 
maintain max-min fairness when the routes are given [4, 5, 6,7]. The above algorithms 
differ by the assumptions on the allowed signaling, and available data. Bartal et al. [7] 
find the total maximum flow allocation in a network for given routes using distributed 
computations of the global MCF LP problem. 

Kelly et al. [8] propose the proportional fairness concepts and a convergence algo- 
rithm. Mo and Warland [9] generalize the proportional fairness and produce end-to-end 
flow control of TCP streams by changing the transmission window size, but again they 
deal with flow allocation without routing. 

LP in Traffic Engineering. Many works considered the multi-commodity flow al- 
location 1 as the mathematical formalism suitable for traffic engineering and path design. 
Ott et al. [16] suggest different off-line LP formulations to optimally spread paths so the 
excess bandwidth will be maximized in an MPLS networks. In order to avoid link "over- 
flow" caused by flows fluctuation, they calculate how to reduce the assigned throughput 
per link. Mitra and Ramakrishnan [17] present algorithms that are based on the MCF LP 
problem allocating bandwidth for the various QoS-type services: QoS and Best-Effort 
traffic. 

Most of the studies, including the ones mentioned above, chose an MCF formulation 
that considers the demands but they do not discuss the max-min fairness in conjunction 
with maximum throughput as the WMCM algorithm does. 

3 Algorithm 

We consider as input a network topology and directional links capacities, a list of ingress- 
egress pairs, and per-pair traffic average-rate demand. Traffic between ingress-egress 
pair may be split arbitrarily among different paths. We model the network as a general 
directed graph where arc label represents link capacity. Each (ingress.egress) peering 
point pair is represented by a different commodity with some demanded rate. 

Our goal is to fulfill clients’ demands optimally while keeping a fair sharing of the 
allocated bandwidth, to lay the set of paths to be used between each pair in the network, 
and to allocate them bandwidth in a maximal way. The fairness criterion is defined by 
the weighted max-min fairness. 

The WMCM algorithm solves, iteratively, the maximum concurrent MCF LP until 
network saturation is achieved. Each iteration is computed on the residual capacity from 
the previous iteration with non-saturated flows left. First, we will provide the formal 
definition of the weighted max-min criterion (subsection 3.1), then we will describe the 
maximum concurrent flow (subsection 3.2), and finally we will present our WMCM 
algorithm (subsection 3.3). 

1 Relevant mathematical and algorithmic background on MCF can be found in [10,11,12]. Spe- 
cific theoretical aspects on the max. concurrent MCF problem and its complexity can be found 
in [13,14.15], 
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3.1 The Weighted Max-Min Criterion: Definitions 

The weighted max-min fairness criterion is an extension [1] of the max-min fairness 
criterion [2]. While the max-min definition is stated for the case where each flow takes 
a single path, it can be also applied to the case where a flow may be split among several 
paths. 

Definition 1. The Commodity Rate Vector, cr, is a vector whose elements are the rates 
which were assigned to the commodities. 



Definition 2. A Flow Rate Vector, fi, is a vector of the rates assigned to a set of paths 
of commodity i. 

From the above definitions we can write that for any commodity i, 'f2 P . eP . fij = cri 
where P t is the set of the paths of commodity i. The weighted max-min fair algorithm 
finds the optimal commodity rate vector cr* and a flow rate vector /, per each commodity 
rate cr* such that cr* is the lexicographically largest feasible vector. 2 

Note that the algorithm provides the optimal commodity rate vector and a specific 
flow rate vector /, that accommodate the optimal cr,;, though the flow rate vector may 
have several valid realizations (see example 1). 

Definition 3. A commodity rate vector cr is said to be max-min fair if it is feasible and 
if each of its elements cri cannot be increased without decreasing any other element ark 
for which cri > cr^ 



Definition 4. A commodity rate vector cr is said to be weighted max-min fair if it is 
feasible and if for each commodity i, cri cannot be increased (maintaining feasibility) 
without decreasing any other element crufor which cri/ derrii > crk/dem.k- 

The two definitions above also hold when traffic may be split to several paths. 



3.2 The Maximum Concurrent MCF Problem 

The Maximum Concurrent flow problem is stated as follows. Let G=( V,A) be a directed 
graph with nonnegative capacities c(a),Va G A. If a A c(a) = 0. There are K 
different commodities: Ci, , Ck where commodity i is specified by the triplet C t = 
( Si,ti,derrii ). s, and ti are the source and the sink of commodity i, respectively, and 
derrii is its rate demand. Each pair is distinct, but vertices may participate in different 
pairs. The objective is to maximize z so all the i = 1. .... K, z ■ derrii units of the 
respective commodities can be routed simultaneously, subject to flow conservation and 
link capacity constraints. The objective z is the equal maximal fraction of all demands. 
There are two ways to formulate this problem: path flow and arc flow formulations. 

2 For the lexicographical order between two vectors v and u we examine them ordered in in- 
creasing order, v and u, respectively. We say that v > u if there is an index i, such that, Vi > Vi 
and Vj = Vj, 1 < j < i — 1. Namely we find the longest equal prefixes of the two ordered 
vectors and define the order according to the first element which is not different. 
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The maximum concurrent flow: path flow formulation 

Let Pi be the set of all the paths of commodity i between s, and /., . The Linear program 
PR assigns the maximum commodity flow to P, while being restricted by the fairness 
criterion. The assigned net flow per arc a is the sum of the net flows of the paths passing 
this arc. PR's solution is composed of the assigned net flow f(Pij) MPij G Pi, i = 
1 , ,I\ and the maximal fairness value 2 . 

LP PR: Path Flow Formulation 

maximize z 
subject to 

K 

Ma G A, EE f(P) < c(a) l*a is an arc on P*/ (1) 

2=1 PePi 

Mi = 1, . . . ,K, ^2 f(P ) > z ■ dertii (2) 

Pep, 

VPeP i=1 x/(P)>o,z>o 

The size of this linear problem grows with the number of possible paths between any pair 
of nodes and can be exponentially large when the network is highly connected. For this 
reason we will reformulate this problem so it could be solved in a polynomial number 
of steps: 

Maximum concurrent flow: arc flow formulation 

The variable acf k (i,j) Mk = 1, . . . , K Ma = (i,j) G A holds the assigned net flow 
per commodity k over arc ( i , j). For each vertex v G V — {s k , t k j, we require that the 
total flow of commodity k into vertex v is equal to the total flow of commodity k out of 
vertex v. Ma G A, the total flow over all commodities is at most c(a). The total flow rate 
of commodity i out of vertex s, represents the maximum possible flow per commodity 
i and should be greater than or equal to the fraction of the respective demand (Eq. 4). 



Problem PR' Arc Formulation 

maximize z 
subject to 

Ma = (u,v) G A, ^2 acf k (u,v) < c(u,v) (3) 

k=l..K 

Mk = 1 ..K,Mv G V — Sk, ^2 ac f‘( s k,v ) > z • derrii (4) 

vGV 

Mk = 1 ..K,M(u,v,w) G V - s k ,t k ,acf k (u,v) - acf k (v,w) = 0 (5) 

Mk = 1..K Mu, v G V, acfk(u, v), z > 0 (6) 



PR' is a reformulation of PR which leads to 0(K ■ m) variables and 0(m + Kn ) 
constraints where m is the number of arcs, n the number of vertices, and K the number 
of commodities, that can be solved in a polynomial number of steps. 

The solution of PR’ problem gives us the same maximal portion of all the demands 
restricted by links capacities. The LP variables hold the maximal value of the allocated 
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Flow decomposition algorithm 

1 . for each commodity k do 

2. set Pk = null 

3. Pk holds the paths for commodity k 

4. while TRUE 

5. let p be a path between (sk, tk)- 

6. if path p was found then 

7. minacfk = min ( u ,v)e P (acfk{u,v)) 

8. Add p to set 

9. f(p) = minacfk 

10. V(w,x) £ p, acfk (w, x) = acfk(w,x) — minacfk 

1 1 . else go to 1 



Fig. 2. The flow decomposition algorithm [18]. 



net flow per commodity. To derive the set of routing paths per each commodity, we need 
to apply, per commodity, the flow decomposition algorithm as described in figure 
2 and [18,17]. This algorithm runs in a polynomial number of steps. It decomposes 
(separately for each commodity) the total net flow per commodity k over all arcs into a 
set of paths. 



3.3 WMCM - Weighted Max-Min Fair Concurrent MCF Algorithm 

As noted before, the solution of the maximum concurrent MCF can lead us to an un- 
saturated network. Assuming two commodities with disjoint paths and the available 
bandwidth per commodity 1 is larger than that of commodity 2. Restricted by Eq. 4, 
namely z, PR’ will assign both commodities the same net flow size although commodity 
2 has more available bandwidth. We have developed a new algorithm that merges the 
desire for network saturation and fairness. The formal description of algorithm is given 
in Figure 3. 

The main idea behind our WMCM algorithm is to increase the allocated net flow 
per arc over the residual graph while keeping the weighted fairness criterion at each 
iteration. We use the maximum concurrent MCF problem solution to increase it opti- 
mally. The algorithm recieves as input the list of commodities, KCOMM, the vector 
of demands, dem, and the graph, G. Each iteration starts (see line 4) with a reduced 
number of commodities (line 19), recalculated demands (line 17) and a residual graph 
of the unassigned capacities (lines 10-12). Aacfk(i, j) holds the net flow assigned in the 
iteration, and z holds the fraction of the demands that was fulfilled. Finally, acfk{i,j) 
holds the total accumulated net flow per commodity k over arc The flow decom- 
position algorithm is performed (line 21) once and provides the maximum routing with 
the weighted max-min fair allocation. The strength of the algorithm is due to the op- 
timality and scalability of the maximum concurrent MCF problem. Its running time is 
K ■ T conc MCF where T conc MCF is the running time of solving maximum concurrent 
MCF LP. 
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VVMCM ( KCOMM,dem,G ) 

1 . /* Initialization stage */ 

2. W(i,j) e A,k = l..K,acf k (i,j) = 0 

3. Gres = G /* G is the original graph */ 

4. while (KCOMM f- null ) do 

5. /* The calculation of the increased net flow by this iteration*/ 

6. Perform LP PR' on Gres with KCOMM and dem k 

7. Returns: 2 and Aacfk{i,j) Va = (i,j) £ A 

8. /* Variable maintenance for the next iteration */ 

9. /* Gres calculation: c(o.) hold links capacities */ 

10. Va = (i, j) 6 A, k = 1 K, 

1 1 . c(a) = c(a) - acfk (*, j ) 

12. if c(a) = 0, A = A \ {a} /* prune saturated links */ 

13. /*Commodities and demands calculations */ 

14. last KCOMM = KCOMM 

15. for commodity k € lastKCOMM do 

16. Va - (i, j),acf k (i,j ) - acf k (i,j ) + Aa.cJ k (i, j) 

17. demk = dem k z ■ dcrn k 

18. if Gres has no connectivity for k then /* k has max. possible assignment */ 

1 9. KCOMM = KCOMM \ {A:} 

20. /*end of while*/ 

21 . Perform "flow decomposition algorithm” on G and a cf k (u, v). k = 1 ..K. 

22. Returns per commodity k: set of paths P k and flows V p ^ f p f(p ? ) 



Fig. 3. WMCM weighted max-min fair concurrent allocation algorithm 



The WMCM algorithm provides us with a commodity rate vector cr, and a set of 
flow rate vectors f k , the rates of the paths Pi £ P k , k = 1, . . . , K per commodity k, 
composing each cry. 

Theorem 1. The commodity rate vector cr provided by the WMCM is weighted max-min 
fair. 

Before proving theorem 1, lets state the following lemma and explanations. The rate 
vector acf n is composed of the accumulated net flow acfk{s k ,j) per commodity k 
where s k is the source of commodity at the end of each iteration n. K n is a set of the 
commodities that participate in iteration n. 

Lemma 1. 3x(n), y(n) and u(n) such thatVi € K n , we have the following: 

- Aacf . ™ = y(n) ■ dem.i (The increased rate is in proportion to the demand) 

- aeff = u{n ) • denii (The accumulated rate is in proportion to the demand) 

- dem.Res " = x(n) ■ dem.i (The residual demand is in proportion to the original 
demand). 
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Proof. We prove by induction on the number of the iterations. 

Forn-1 , the recalculated demand is (1 — zf) ■ denii (see lines 17), the total increased 
rate (calculated in lines 7) is Z\ ■ denii and the total rate is z\ ■ derrii. 

The following three equations are the induction assumption for iteration n, 

n— 1 

Aacff = z n ■ n (1 — zi) • derrii /*Rate increase */ (7) 

1=1 

n l—l 

ac fi =(J2 zi ' I (1 — z rn )) ■ denii /*Total rate*/ (8) 

l—l m= 1 

n 

demRes " = na — zf) ■ denii /*Residual Demands*/ (9) 

t=i 

Proof for iteration n + 1: KCOMM n+1 holds the commodities that participate in 
iteration n+1. The rate increase (WMCM line 7) for any commodity i £ KCOMM n+1 
is: 

Aacff +1 = z n+ 1 • demRes/ = z n+1 • (1 - z{) ■ denii (10) 

1=1. .n 

Where the first transition is due to Eq. (9). 

The accumulated rate for any commodity i £ KCOMM n+1 is: 

acff +1 = acfi + Aacff +1 

n l—l n 

= Zi ■ (1 - z m )) ■ denii + z n+1 ■ (1 - z h ) ■ denii 

1=1 m= 1 h = 1 

l-l 

= ( ^2 Zi ■ (1 - z m )) ■ denii 

1=1. .n+1 m=l 

Where the first transition is due to Eqs. 8 and 10. 

The residual demands are calculated at the end of iteration n + 1 as follows: 

n+1 

demRes™ +1 = (1 — z n + 1 ) • demRes " = JJ(1 — zf) ■ denii (11) 

;=i 

Where the first transition is due to Eq. (9). 

The lemma was proved by setting x (n+ 1) = Jli=i n+ i(l — zi), setting i/(n + l) = 

Zn +1 ■ n ; =i..„( 1 - zi) and settin g u(n + 1) = E i=1 ..„ +1 z t ■ rim=i..l-i( 1 “ z m ) as 

required. 

□ 

We now return to the proof of Theorem 1 
Proof We prove by induction. 

Base step: In the first iteration, where n = 1, acf 1 is the solution of PR' where for 
all commodities i and j, acfi ~ z t ' deni, and acfj = z\ ■ denij implying acfi / ac fj = 
denii /denij . acf / can not be increased this iteration due to the PR' restrictions (line 7). 
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Induction Assumption: The weighted max-min fair order holds for iteration n. acf n 
is feasible and if for each commodity i, acf™ cannot be increased without decreasing 
any other acf' 1 for some commodity j for which acff / acf™ > derrii / denij . 

Iteration n + 1: KCOMM n+1 is the set of all the commodities that participate in 
iteration n+ 1. KCOM M SAT is the set of all commodities that were saturated before, 
in one of the previous iterations. We distinguish among three cases for any commodity 
i and j: 

- Case 1: Both commodities were saturated in the previous iterations, such that i,j £ 
KCOM M sat . This case holds trivially because of the induction assumption. 

- Case 2: Only one of the two commodities was saturated before. W.l.o.g., assume 
that i £ KCOM M n+1 and j £ KCOMM SAT . Commodity j cannot increase 
its flow since it was deleted from the list (see line 19 in WMCM). If it was deleted 
in the previous iteration, n, then acf™ / ac/” = derrii /denij holds before starting 
iteration n + 1, and thus any increase in commodity i rate (see lines 7 and 16) will 
imply ac/" / acf™ > derrii / denij . If j was deleted before the previous iteration, n, 
we know that acff / acf™ > denii / denij and then any increase in V s rate will keep 
the relation. 

- Case 3: Both commodities participate in iteration n + 1, thus, i,j £ KCOM M n+1 . 
Since both commodities participated in all the previous iterations, they gained rates 
such that acf™ /acf™ = denii /denij. As proved in lemma 1, the gain increase in 
this iteration keeps the same relation between the rates such that acf™ +1 / acf™ +1 = 
denii/ denij. 

Finally, KCOMM SAT is reduced in each iteration which ensures termination. 

□ 



4 Concluding Remarks and Consequent Work 

We presented a new traffic engineering algorithm for routing demands in a networks 
in a way that maximizes the flows and maintain fairness. The weighted max-min fair 
criterion serves better current needs for quality of service regarding clients demands. The 
algorithm can be used when clients request are above capacity to minimize the loss of 
revenue and maximize customer satisfaction. It can also be applied when network is not 
congested to lower the maximum load on the links. The WMCM algorithm provides an 
optimal max-min fair commodity vector by solving iteratively a linear problem. The run 
time using linear program techniques can be large, though polynomial. For this reason, 
we developed, in a consequent work [19], an epsilon-approximation algorithm that is 
based on some of this work ideas and that solves the problem faster. 
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Abstract. Traffic engineering capabilities defined in MPLS enables QoS online 
routing of LSPs. In this paper we address issues of network survivability in 
online routing. We define a new link weight LFL (Lost Flow in Link) that can 
be applied for dynamic routing of LSPs in survivable MPLS network. We pro- 
pose to use the LFL metric as a “scaling factor” for existing dynamic routing 
algorithms that make use of additive metrics. Results of simulations show that 
the new metric can substantially improve the network survivability defined as 
the lost flow function after KSP local rerouting. However, the application of 
LFL metric slightly increases the number of rejected calls. 

Keywords: Online routing, survivability, QoS. 



1 Introduction 

In recent years, there has been an increasing demand for network survivability. In 
response to this demand, the research community has been intensively investigating 
issues of static and dynamic optimization of survivable networks. In this paper we 
propose a new approach to improve the effectiveness of online routing from the per- 
spective of network survivability. We define a link weight that can be applied as an 
enhancement of existing constraint-based online routing algorithm. Our interest fo- 
cuses on Multiprotocol Label Switching (MPLS) technique [11]. The main idea of 
MPLS survivability is as follows. Each circuit, LSP (label switched path), has a work- 
ing route and a backup route. The working route is used for transmitting of data in 
normal, failure-free state of the network. After a failure of the working route, the 
failed circuit is switched to the backup route. In this work we focus on local restora- 
tion (called also local repair) [3], [12], We assume that the backup route is found only 
around the failed link. The origin node of the failed link is responsible for rerouting. 
In modern computer networks a single-link failure is the most common and fre- 
quently reported failure event [3], Therefore, in most of optimization models a single- 
link failure is considered as the basic occurrence. 

We make an assumption that all LSPs are not known a priori. We apply explicit 
routing of MPLS proposed in [11]. Since the ingress node of the LSP makes the rout 
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ing decision, links that are congested can be avoided. In addition, the use of dynamic 
link weight instead of static link weight is enabled. Therefore, selection of paths that 
meet the required QoS parameters is achievable. We assume that each new LSP’s 
route is determined by the shortest path algorithm applying selected link weight. The 
following link state information either is flooded by the routing protocol or known 
administratively: total flow on link, link capacity and network topology. Two first 
items require traffic engineering extensions provided by extended OSPF [6], [13]. 

In this paper we apply the LFL (Lost Flow in Link) function defined by the author 
in [15]. LFL can be effectively applied for assignment of working routes in connec- 
tion-oriented networks using local restoration. LFL is used to define a new link 
weight that can be applied for dynamic routing of LSPs in survivable MPLS network. 
We present and discuss results of extensive simulations run on various networks. 



2 Related Work 

In this section we describe briefly several constraint-based dynamic routing algo- 
rithms developed for connection-oriented networks. The most common routing algo- 
rithm applied in computer networks is the shortest path first (SPF) algorithm based on 
an administrative weight (metric). A popular metric is the number of hops applied in 
Min Hop Algorithm (MHA). The SPF method applies additive weights - the path’s 
length is calculated as the sum of the link-cost of all links along a path. A valuable 
enhancement of SPF method is the feasible network approach. The residual capacity 
of a link is defined as the difference between the capacity of the link and the current 
flow of the link, which is calculated as a sum of the LSPs’ bandwidths that are routed 
on that link. The feasible network for a new call consists of all routers and links, for 
which residual capacity exceeds bandwidth requirement of the new path. Thus, rout- 
ing in the feasible network guarantees that allocation of a new call will not violate the 
capacity constraint. 

Another dynamic routing algorithm is minimum interference routing algorithm 
(MIRA) proposed in [7], The major idea of routing in MIRA system is to prevent 
selecting “critical links’’ that may “interfere” with potential future paths. The objec- 
tive of MIRA is to find a feasible path that contains the least number of critical links. 
The approach similar to MIRA was also applied in papers [4-5], [8]. 

Authors of [14] proposed a routing algorithm called dynamic online routing algo- 
rithm (DORA). The main goal of DORA is to effectively utilize existing network 
resources and minimize network congestions by carefully mapping paths across the 
network. Results presented in [14] shows that DORA requires fewer paths to be re- 
routed and obtains higher successful reroute percentage than either SPF or MIRA. 
DORA is computationally less expensive than MIRA and has a runtime of SPF. 

LIOA (Least Interference Optimization Algorithm) described in [1] reduces the in- 
terference among competitive flows by balancing the number of quantity of flows 
carried by a link. LIOA is based on the SPF method and the feasible network ap- 
proach. Results of simulations from [1] show that LIOA performs better than MIRA 
in terms of rejection ration, successful rerouting upon single link failure. 
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Constraint-based routing research has been typically concentrated on the routing of 
a single path without taking into account survivability requirements. Recently, there 
has been much interest in restorable QoS routing, where working and backup paths 
are setup simultaneously to protect the network against failures. Such an approach 
was applied in [5], [8-9]. In our work we also address the problem of online routing 
in restorable MPLS networks. However, we establish only working paths. We assume 
that when the failure occurs backup paths are established dynamically using the link 
recovery approach. Backup paths are not established in advance, consequently the 
capacity has not to be reserved for backup paths. We accept the fact that some of 
LSPs affected by the failure can be not rerouted due to limited resources of residual 
capacity. It means that an LSP accepted to the network and allocated to a working 
path, can be not restored when a failure occurs. Reservation of extra spare capacity 
for each LSP is not always economically rational. Some LSPs carry traffic of low 
importance. Such LSPs can be lost in an emergency situation in order to enable better 
restoration of other high-valued LPSs. The concept of SLA (Service Level Agree- 
ment) can be used to facilitate the process of LSP prioritizing. The objective is to 
minimize the overall lost bandwidth of not rerouted LSPs. Similar approach was 
proposed in [10]. 



3 Definition of a New Metric for Survivable Online Routing 

In this section we define a new link metric that can be applied for online routing of 
LSPs in order to improve the network survivability. The metric is based on the LFL 
function defined and discussed in [15]. We assume that the link rerouting is used a 
restoration method. The MPLS network is modeled as (G,c) , where G = (V . A ) is a 
directed graph with n vertices representing routers or switches and m arcs represent- 
ing links, c : A —> R + is a function that defines capacities of the arcs. We denote by 
o : A — » V and d : A — » V functions defining the origin and destination node of each 
arc. 

To mathematically represent the problem we introduce the following notations 
f a Represents the total flow on arc a, it is a sum of bandwidth re- 

quirements of LSPs that uses arc a. 



C a 


Capacity of arc a. 


OUt r 

8v = I fi 


Aggregate flow of outgoing arcs of v. 


i:o(i)=v 


in £ 

8 v ~ X fi 


Aggregate flow of incoming arcs of v. 


i:d(i)=v 


out v" 

e v = Ic 
i:o(i)=v 


Aggregate capacity of outgoing arcs of v. 


in X- 

e v = I c. 


Aggregate capacity of incoming arcs of v. 


i:d(i)=v 
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The MPLS flow is modeled as a non-bifurcated multicommodity (m.c.) flow de- 
noted by / = [/i,/ 2 ,...,/ m ]. 

For the sake of simplicity we introduce the following function 

ft) for v<0 (1) 

e(x) - < 

[x for x > 0 



To examine main characteristics of the local restoration we consider an arc ke A. 
We assume failure of k. In the local rerouting flow on the arc k must be rerouted by 
the source node of the arc k. Therefore, spare capacity of outgoing arcs of oik) except 
k is a potential bottleneck of the restoration process. If spare capacity of arcs leaving 
node oik ) is relatively small, some flow of the arc k could be lost. We define the func- 
tion L° ut of the arc k flow lost in the node o(k) in the following way 




Note that L° ut denotes lost flow that cannot be restored using arcs leaving the 
node oik ) due to limited spare capacity of these arcs. Correspondingly, we define 
function iff of lost flow that cannot be restored using arcs entering the node dik). 

4 Uf) = 4fSdik)~(ef (k) -c k )) (3) 



Function L k (4) is a linear combination of flow lost in arcs outgoing oik) and arcs 
incoming dik). L k only estimates the flow of arc k lost after local rerouting. 

L k if) = 0.5i^if) + LT l if)) (4) 

In (4) we use functions L? ut and L'f in the same proportion, however also another 
combination of these two functions can be applied. Generally, according to our simu- 
lations, function L k if) gives similar performance for various values of the propor- 
tion ratio. Using L k if) we can define a function L(f ) that shows preparation of the 
whole network to the local rerouting after a failure of any single arc. We assume that 
probability of the arc failure is the same for all arcs. Therefore, probability is not 
included in this function. 

L(f)= X L k if) = 0.5( X Lf (/)+ X 4" (/)) (5) 

k<E A keA keA 

For more details and comprehensive discussion on the LFL function refer to [15]. 

In order to make easier the consideration we introduce a new function as follows 




292 



K. Walkowiak 



m(x) 



JO for x < 0 
} 1 for x > 0 



(6) 



Next, we define the two following functions 



Z out (/) = 


x^r-K out -n)) 


(7) 




i:o{i)=v 




4 n (/) = 


x4^ n -(4 n -o)) 


(8) 




i:d(i)=v 





To find a new link weight we use as in [2] the partial derivative dL/dfj of the ob- 
jective function. However, the LFL function is not differentiable everywhere [15]. 
Therefore, combining functions (7) and (8) we define a new link weight as follows 

1 ™= 0 - 5 ( 1 $ +#(,■)) (9) 

Note that / ( L[ L is a partial derivative for a feasible flow / except points 

g° ut = 0° ut - c i ) for all i : o(i) = v and g‘ n = (<?"’ - q) for all i : d(i) = v . In these 
points the function / ( is equal to the left-sided derivative of the function L. 

The weight /,• of a link i shows the change in the objective function if an in- 
cremental amount of a demand is routed on that link or other links adjacent to that 
link. Note that if the network saturation is relatively low, the value of /, is 0. 

T FT 

When the network load increases and reaches a particular value the weight, / ( 

T FT 

starts to grow. The / ( can be used as a link weight in online routing algorithms 
based on the SPF method. However, we propose to use the metric / ( as a scaling 

factor for link metric /M ETRIC 0 f an y existing online routing algorithm, which ap- 
plies additive metrics in the following way 

^METRIC_LFL _ ^ + ^LFLyMETRIC (10) 

T FT 

According to the definition and discussion of / ; - , the scaling factor “turns on” 

when the network is relatively highly saturated. For lightly loaded network the origi- 
nal metric doesn’t change. Computational cost of using the new metric in routing 
algorithm is O (mn) - the same as in algorithms: Constraint Shortest Path First (CSPF), 
MHA, and second stage of DORA. It means that applying LFL metric doesn’t worsen 
the computation complexity of these algorithms. In contrast, MIRA has complexity of 
(0(n 5 )+0(m 2 )), the first stage of DORA has complexity of O (nrn) [ 14]. 
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4 Results 

We conducted simulation experiments to evaluate the influence of LFL scaling factor 
applied to five popular online routing algorithms: MHA, MIRA [7], CSPF, DORA 
L14J using the bandwidth proportion parameter BWP=0.5 and LIOA [1] using the 
calibration parameter a=0.5. For each of tested algorithms we use the LFL metric in 
the same way as in (10). As the major performance indicator we apply the lost flow 
function using the KSP (k-shortest paths) rerouting method presented in [3], [10]. The 
KSP lost flow function is calculated as follows. For each link of the network we as- 
sume failure of this link. Next, we try to reroute all LPSs that traverse the failed link 
using the KSP method. Bandwidth requirement of LSPs that cannot be restored due to 
limited residual capacity is summed up. Obtained KSP lost flow function is an aggre- 
gate performance metric that shows the network capability to perform local restora- 
tion after a failure of any single link. 

Results presented in this section are acquired from simulations on the network used 
as a benchmark in [1], [4] and [7]. The network consists of 15 nodes and 56 directed 
links. Links capacity is 12 to model the capacity ratio of OC-12. During simulations, 
the link capacities were scaled by 100 to enable establishing of thousands of LSPs. 
Each network node is used as an ingress and egress node what results in 210 ingress- 
egress pairs. Authors [7] considered only LSPs between 4 ingress-egress pairs. We 
apply the same network as in [1], [4] and [7] in order to enable rational evaluation of 
our approach against other well-known algorithms proposed in the literature. More- 
over, since the MIRA is computationally excessive, we limit our simulations to rela- 
tively small network. 



Table 1. LFL performance improvement of KSP lost flow aggregated over 10 trails 



AVLU 


MIRA 


DORA 


LIOA 


MHA 


CSPF 


0.40 


0.00% 


25.81% 


15.38% 


40.64% 


16.67% 


0.45 


0.00% 


29.46% 


28.69% 


38.13% 


29.84% 


0.50 


11.76% 


22.45% 


25.58% 


45.40% 


26.95% 


0.55 


14.97% 


21.45% 


25.95% 


52.43% 


26.71% 


0.60 


16.33% 


20.76% 


23.40% 


50.29% 


23.54% 


0.65 


18.34% 


21.32% 


19.66% 


41.97% 


19.56% 


0.70 


15.00% 


18.18% 


14.02% 


34.88% 


14.01% 


0.75 


6.12% 


6.98% 


11.70% 


26.54% 


9.23% 


0.80 


2.11% 


-1.92% 


2.21% 


18.64% 


0.61% 



We consider static paths resembling long-lived MPLS tunnels that once estab- 
lished, they stay in the network for a long time [3], Sets containing LSPs are gener- 
ated randomly. For comparison of weights in terms of KSP lost flow function, we 
created such sets of demands that each LSP can be established for each analyzed link 
weight. In other words, for each tested link weight all LSPs are satisfied, none LSP is 
rejected. Such simulation scenario enables rational comparison of all link metrics in 
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terms of KSP lost flow function. If for a particular weight some LPSs weren’t estab- 
lished , the comparison of metrics’ performance would be irrational because the num- 
ber of demands placed in the network wouldn’t be the same for all tested metrics. We 
assume that the requested bandwidth of LSPs is randomly distributed between 1 to 8 
units of bandwidth. The average number of LSPs for each trial is 5600. We repeat the 
simulations for 10 various sets of demands. We record the KSP function after each 
LSP setup. 

Another objective of our simulation study is to examine the influence of the LFL 
scaling factor on the path setup rejection ratio. In this case we don’t guarantee that all 
LSPs in a given experiment are established. We examine 10 sets of 10000 static long- 
lived LSPs and 10 sets of 25000 dynamic short-lived connections. 

To present the results we use the AVLU (Average Link Utilization) parameter that 
shows the average network saturation. For instance, AVLU=0.5 means that in average 
50% link capacity is allocated to LSPs in the network. Another parameter is the LFL 
performance improvement. It is calculated as (RES-RES_LFL)/RES, where RES is 
the result obtained for standard routing algorithm, RES_LFL is the result obtained for 
the same algorithm using the scaling factor LFL as in (10). 

In Table 1 we report the LFL performance improvement of KSP lost flow function 
for AVLU between 0.35 and 0.8. We can see that only in one case for strongly loaded 
network (AVLU=0.8) the LFL scaling factor doesn’t improve the KSP lost flow func- 
tion for metric DORA. In all other cases when LFL is turned on, the network surviv- 
ability is improved. The best gain is obtained for MHA metric. This can be explained 
by the fact that MHA uses no information on network congestion and applying the 
LFL metric introduces some additional traffic engineering information. Fig. 1 shows 
the performance improvement obtained for AVLU=0.7. 




MIRA 
DORA 
^LIOA 
-x- MHA 
CSPF 



Fig. 1 . LFL Performance improvement of KSP lost flow for AVLU=0.7 

Generally, good performance of LFL scaling factor is consistent with theoretic 
considerations presented in Section 3. LFL tries to leave capacity open in regions of 
the network that are critical for network restoration. The lowest KSP lost flow func- 
tion is obtained for the metrics MIRA_LFL and LIOA_LFL. The former metric is the 
best for small and medium saturated network (AVLU<=0.65). The latter link weight 
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Fig. 2. KSP lost flow function as a function demand number 

outperforms other weights for loaded networks (AVLU>0.65). This can be seen on 
Fig. 2 that shows the KSP lost flow function after every LSP is setup. 

Table 2 shows the influence of LFL metric on the performance of tested metrics in 
terms of rejected calls. We report aggregate results for static long-lived LSPs and 
dynamic short-lived LSPs. In both cases LFL improves the result only for MHA. For 
other metrics, LFL scaling factor worsens the result. However, the degradation is not 
higher than 2.16%. The lowest number of rejected LSPs is obtained for metric LIOA. 
The worst performance, as reported in many other papers, offers MHA. 




Trial Number 



-•-MIRA 
DORA 
-*- LIOA 
MHA 
CSPF 



Fig. 3. LFL performance improvement of rejected calls for dynamic LSPs 



Table 2. LFL performance improvement of rejected LSPs aggregated over 10 trails 





MIRA 


DORA 


LIOA 


MHA 


CSPF 


Static LSPs 


-1.22% 


-2.05% 


-2.16% 


0.44% 


-1.86% 


Dynamic LSPs 


-0.23% 


-0.83% 




2.52% 


-0.68% 
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Fig. 3 plots the performance improvement of rejected calls for dynamic LSPs. 
Fig. 4 shows the number of rejects as a function of demand number for link weights 
LIOA, L!OA_LFL, MHA and MHA_LFL. 




Fig. 4. Number of rejects as a function demand number 

In Table 3 we present how the LFL scaling factor changes the capacity usage de- 
fined as the overall network capacity allocated by LSPs. We can see that applying 
LFL leads to selection of longer paths and higher capacity consumption. However, as 
shown above, this guarantees good performance in terms of network survivability. On 
the other hand, lower residual capacity obtained for LFL yields blocking of new de- 
mands expressed in higher number of rejects. Note that for AVLU<0.5 the perform- 
ance improvement is 0, since the LFL scaling factor doesn’t work for less congested 
networks. 



Table 3. LFL performance improvement of capacity usage aggregated over 10 trails 



AVLU 


MIRA 


DORA 


LIOA 


MHA 


CSPF 


0.40 


0.00% 


0.00% 


0.00% 


0.00% 


0.00% 


0.50 


-0.01% 


-0.05% 


-0.02% 


-0.12% 


-0.04% 


0.60 


-0.11% 


-0.18% 


-0.15% 


-0.60% 


-0.20% 


0.70 


-0.74% 


-0.59% 


-0.71% 


-0.58% 


-0.73% 


0.80 


-0.60% 


-1.76% 


-1.85% 


0.26% 


-2.01% 



5 Conclusion 



In this paper we have applied the approach proposed in [15] to improve existing 
online routing algorithms in terms of survivability. We have focused on the local 
repair - one of methods proposed for recovery of MPLS networks. We have demon- 
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strated that the new metric LFL supports network optimization and protection under 
single link failure. Simulation results confirm the robustness of our approach. The 
best improvement of network survivability defined by the KSP lost flow function was 
obtained for Minimum Hop Algorithm. It is evident, because the classical MHA does 
not apply any traffic engineering data on network congestion. However, for other 
algorithms that include information on link saturation the LFL also can improve the 
results. Applying LFL in considered algorithms does not affect significantly the per- 
formance in terms of the path setup rejection ratio. Our approach requires less routing 
information than in previous works on this field [4], [7], 
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Abstract. Conventional Quality of Service (QoS) routing cannot be 
applied easily to wireless ad-hoc sensor networks due to the unreli- 
able and dynamic nature of such networks. For these networks, we 
have proposed a framework of Message-initiated Constraint-Based 
Routing (MCBR), which consists of a QoS specification and a set of 
QoS-aware meta-strategies. In contrast to most existing ad-hoc routing 
with no QoS support, MCBR is able to take QoS specifications into 
account. In this paper, we focus on learning-based meta-strategies. 
In contrast to most existing QoS routing approaches, learning-based 
meta-strategies do not create and maintain explicit routes; instead, pack- 
ets discover and improve the routes during the search for the destination. 

Keywords: Mobile and Wireless Networks, Ad-hoc Sensor Networks, 
Meta-strategies, Reinforcement Learning. 



1 Motivation and Introduction 

Large-scale ad-hoc networks of wireless sensors have become an active topic of 
research. Such networks share the following properties: 

— embedded routers - each sensor node acts as a router in addition to sensing 
the environment; 

— dynamic networks - nodes in the network may turn on or off during operation 
due to unexpected failure, battery life, or power management; attributes 
associated with those nodes (locations, sensor readings, load, etc.) may also 
vary over time; 

— resource constrained nodes - each sensor node tends to have small memory 
and limited computational power; 

— dense connectivity - the sensing range in general is much smaller than the 
radio range, and thus the density required for sensing coverage results in a 
dense network; 

— asymmetric links - the communication links are not reversible in general. 
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Applications of sensor networks include environment monitoring, traffic control, 
building management, object tracking, etc. Routing in sensor networks, how- 
ever, has very different characteristics than routing in traditional communica- 
tion networks. First of all, address-based destination specification is replaced by 
a more general feature-based specification, such as geographic locations [4] [13] 
or information gains [3]. Secondly, routing metrics are not just shortest delay, 
but usually multiple objectives, including energy usage and information density. 
Thirdly, in addition to peer-to-peer communication, multicast (one-to-many) and 
converge-cast (many-to-one) are major traffic patterns in sensor networks. Even 
for peer-to-peer communication, the source/destination pairs often are dynamic 
(changing from time to time) or mobile (moving during routing). 

Various routing mechanisms have been proposed and implemented for sensor 
networks or wireless ad-hoc networks in general [7]; however, most of them do 
not have Quality of Service (QoS) support. Distributed QoS routing strategies 
for mobile ad-hoc networks have also been proposed [2] [1], all of which, however, 
like most other routing strategies, first establish routes between the source and 
the sink and then follow up with a route maintenance phase if the route is broken. 

We have proposed Message-initiated Constraint-Based Routing (MCBR) [14] 
for wireless ad-hoc sensor networks. MCBR is a framework of routing mecha- 
nisms composed of the explicit specification of constraint-based destinations, 
route constraints and QoS requirements for messages, and a set of QoS-aware 
meta-strategies. With the separation of routing specifications from routing 
strategies, general-purpose meta routing strategies can be applied. In contrast 
to most existing ad-hoc routing strategies with no QoS support, MCBR takes 
QoS specifications into account. In this paper, we focus on a set of distributed 
routing strategies based on real-time reinforcement learning [11]. In particular, 
three types of meta-strategies are proposed: real-time search, constrained flood- 
ing, and adaptive spanning tree. All of these use the same reinforcement learning 
core, which estimates and updates the cost from the current node to the des- 
tination. The first two strategies have been presented elsewhere [14], while the 
last one is newly added to this family. The contribution of this paper is twofold: 
first, the use of MCBR for QoS specification, and second, the introduction of 
learning-based meta-strategies. The performance evaluations of these strategies 
and comparisons to AODV [6] are presented as well, using a real application 
scenario for sensor networks. 

The rest of the paper is organized as follows. Section 2 introduces MCBR for 
QoS routing. Section 3 presents three QoS-aware learning-based meta-strategies. 
Section 4 discusses performance evaluations for these strategies. Section 5 con- 
cludes the paper and points out future directions. 



2 MCBR for QoS Routing 

MCBR [14] provides a general, flexible, and compositional mechanism for pro- 
viding QoS message specification and QoS-aware meta-strategies. An MCBR 
message specification consists of a destination constraint, a route constraint, 
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and a QoS routing objective. An MCBR meta-strategy is QoS-aware using the 
message specification. This is along the line of Smart Packets for Active Net- 
works [ 9 ]; however, in MCBR, packets do not carry code. Only the specification 
(and possibly an additional selection of a particular meta-strategy) is passed 
through the network. For networks with small data frames, one can even encode 
various specifications in nodes and let packets only carry a specification ID with 
parameters. 

A network can be represented as a graph (V, E), where V is the set of nodes 
and E is the set of connections. For an asymmetric network, (v,w) £ E does 
not imply (w,v) £ E. Given a destination constraint of message ro, a node 
v is a destination node for m iff is satisfied at v. For example, address- 
based routing, i.e., sending a message to a node with an address ad, can be 
represented using the destination constraint a = ad, where a is the address 
attribute. Geographical routing, e.g., sending a message to a circular region cen- 
tered at (xo, 2/0 ) with radius c, can be represented using the destination constraint 
(x — Xo ) 2 + (y — yo ) 2 < c, where x and y are location attributes. 

Given a local route constraint C ^ of message m, the active network of (V, E) 
for m is a subnet (V m ,E m ), such that v £ V m iff is satisfied at v and 
(v,w) £ E m iff v,w £ V m and (v,w) £ E. For example, a message that should 
avoid congested nodes while routing to its destination has a local route constraint 
l < l rn , where l is the message load attribute (e.g., number of messages in the 
node’s queue) and l rn is the load limit. One can also use geographical constraints 
(e.g., directional routing) to reduce collision and save energy for a flooding-based 
strategy. In general, local route constraints redefine the network connectivity on 
a message- by- message basis. 

MCBR explicitly specifies routing objectives. A local objective function o is 
defined on a set of attributes: o : A\ x A 2 x . . . x A n — > R + , where A t is the 
domain of attribute i and R + is the set of positive real numbers. The value of o 
at a node v, denoted o(v), is 0(01,02, . . . ,a n ), where a,; is the current attribute 
value of attribute i at node v. A local objective function can be a constant such 
as the unit transmission cost, which induces the shortest path if the objective 
is minimized. For another example, an energy-aware objective can be defined 
as ku + c, where u is the amount of used energy in the node, and k and c are 
constants. With this objective, energy-aware routing can be achieved. Similarly, 
one may use k/n+c as a local objective, where n is the number of neighbors. With 
this objective, connectivity-aware routing can be achieved. Multi-objectives can 
be obtained by combining individual objectives, e.g., in a weighted sum. 

A local objective can be aggregated over the routing path to form a global 
routing objective. There are two types of global aggregation, additive and con- 
cave. Like general QoS specifications, a global objective function O is additive 
if 0 (p) = EiLo where o is a local objective function and p consists of a 
sequence of nodes vo,...,v n ; For the meta-strategies discussed in this paper, 
only additive objectives are considered. 

Problems for MCBR tend to be in one of two classes. One is anycast, namely 
finding an optimal path from the source to one of the destination nodes. The 
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other is multicast, namely finding an optimal tree from the source to all the 
destination nodes. 

An MCBR specification for a message m is a tuple (v^, C^, C^, O m ). The 
goal of routing is to deliver the message from v ^ to one (anycast) or all (multi- 
cast) of the destination nodes V,f n satisfying via a sequence or a tree of inter- 
mediate nodes p : v ^, . . . , v'fff 1 such that C ^ is satisfied at v l m and min p O m (p). 
Two messages are considered to have the same type if they have the same desti- 
nation and local route constraint as well as the same routing objective. 

One should notice that global route constraints are not defined in MCBR. 
It is well-known that finding an optimal path with an additive objective while 
satisfying an additive constraint is NP-lrard. Unicast MCBR with an additive 
objective is essentially a weighted shortest path problem. Our goal is to make 
MCBR a simple (in terms of computation) yet still powerful (in terms of repre- 
sentation) mechanism for ad-hoc sensor networks. 



3 QoS- Aware Learning-Based Meta-strategies 

MCBR separates routing specification from routing meta-strategies. One can 
modify an existing routing strategy, such as AODV, to be a QoS-aware meta- 
strategy for MCBR. Most existing strategies, however, establish a route from 
the source to the destination via flooding the network. In this case, extra control 
packets are required for repairing broken routes. 

Here, we propose QoS-aware learning-based meta-strategies. Real-time re- 
inforcement learning [11] has been studied and applied mostly in agent-based 
path planning [5] . We apply this powerful technique to develop distributed meta 
routing strategies for sensor networks. 

Given a routing specification of a message, including the destination and 
QoS requirements, one can define a cost function on each node, called Q-value, 
indicating the minimum cost-to-go from this node to the destination. For a dis- 
tributed sensor network, the cost is initially unknown, and an initial estimation 
is made according to the type of message. Furthermore, a node also stores its 
neighbors’ Q-values, NQ-values, which are estimated initially according to the 
neighbors’ attributes and updated when packets are received from neighbors. 

The learning-based meta routing strategies typically consist of an initializa- 
tion phase, a forwarding phase, and a confirmation phase. Learning happens in 
all phases. For each packet sent out from a node, the current Q-value of the 
node for the type of message is attached. All the nodes are set to be in promis- 
cuous listening mode. Whenever a node overhears a packet of type m, whether 
it is the designated receiver or not, it updates the corresponding NQ- value and 
re-estimates its own Q-value using the equation 

Qm = (1 - a)Q m + a(o m + min NQ m (n)) (1) 

n 

where a is a learning rate, o m is the current value of the local objective function, 
and n is a neighbor of this node. 
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Using the Q-value, real-time search passes the packet to the “best” neigh- 
bor according to the estimates, constrained flooding decides if and when to re- 
broadcast the packet according to the cost estimates, and adaptive spanning 
tree forwards the packet to its parent, with parents possibly changing over time 
pointing to a neighbor with the best Q-value. This approach has a number of 
attractive properties: (1) explicit use of destination and QoS specifications for 
finding optimal routes; (2) automatic adaptation with different routes when net- 
work conditions change; (3) no need for extra maintenance packets; and (4) no 
infinite looping if a path to the destination exists. 



3.1 Meta-strategy 1: Real-Time Search 

The pseudo code of real-time search is illustrated in Figure 1, where Qm( n ) is an 
initial estimate for node n, according to the destination and QoS requirement of 
the message and the attribute values of this node [15]. Please note that although 
this strategy is infinite loop free, it is not loop-free. However, it has been proved 
that the maximum path length is 0(N 2 ) and it will converge to the optimal path 
in 0(N 2 ), where N is the number of nodes in the network. For space limitations, 
readers are referred to [15] for the theoretical bounds and variations of this 
meta-strategy. 



3.2 Meta-strategy 2: Constrained Flooding 

In contrast to the search-based methods, where each node decides which of the 
neighboring nodes to forward the message to, flooding-based strategies decide 
whether or not to broadcast at each node. A few gradient-flooding type strate- 
gies have been developed [12], requiring a cost held to be established beforehand 
or specialized for geographical routing. We propose a constrained-hooding meta- 
strategy, where the cost, i.e. , the Q-value, can be learned if not known a priori. 
Figure 2 illustrates the basic idea. Like other gradient-hooding routing proto- 
cols [12], the cost is transmitted together with the packet. In addition, the cost 
at each node is updated every time a packet is received. The update rule is the 
same as for search-based strategies. Two techniques are used here to control the 
hood: (1) cost difference - if the receiving node estimates a significantly higher 
cost than the transmitting node, no action is taken except for updating its cost 
held; and (2) time difference - the transmit time difference is added to the broad- 
cast, so that nodes with better estimates transmit hrst, while duplicate packets 
are suppressed. In this algorithm, a “temperature” variable T is used to control 
the hood: the higher T, the higher the chance that a packet is broadcast. 

If the destination is known a priori, which turns out to be the case for many 
routing applications in sensor networks, backward constrained hooding from 
the destination can be used initially to establish the cost held. If there is no 
initialization and the cost held is hat, one can set T high initially and let it cool 
down when the cost held is more settled to reduce collisions and save energy. 
This strategy has been briehy discussed in [14]. 
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Forwarding phase: 

received (m, Q) at w from node u do 
if new(m) then 

for all v with (v,w) € E m do NQ m (v) <— Ql(«); end 
Qm t Q m (w ); 

end 

if satisfied(Cm) then Q m <— 0; broadcast(m, 0); return; end 
NQm(u) <— Q; 

Qm <- (1 - a)Qm + a(om + min„ATQ m (w)); 

if designated (m) then 

v <— argmin n A r Q m (n); (random tie break) 
send(m, Q m ) to v; 

end 

end 

Confirmation phase: 

timeout (m to v) 

NQ m (v) <— max,, NQ m (n ) + 1; 
if (resend) then 

v <— argmin„A r Q m (n); (random tie break) 
send(m, Qm) to v; 

end 

end 



Fig. 1. Real-time search meta-strategy 



Forwarding phase: 

received ( m , Q) at w from node u do 
if new(m) then 

for all v with (v,w) £ E m do NQ m (v) <— Qm(v); end 

Qm t Q m (rc), 

end 

if satisfied(C^ l ) then Q m «— 0; broadcast(m, 0); return; end 

NQm{u) <- Q\ 

Qm t- (1 - a)Qm + a(o m + min„A^Q m (v)); 

if m is in transmit queue then remove m from transmit queue; return; end 
if (( Qm ~ NQm{u)) < T) 

broadcast(m, Q m ) to all neighbors after fc(Q m — NQm(u)) + 8 time units; 

end 

end 



Fig. 2. Constrained- flooding meta-strategy 
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Initialization phase: 

for all v do NQ m (v ) 4— inf end 
received (m, Q) at w from node u do 

.Y Qm^u) 4 Q , Qm. 4 (1 Oi^Qm 4- cx(om 4“ min^.Y Qm(^)), 

p' m 4— argmin „JVQ m (n); (random tie break) 
if Pm 7^ Pm then broadcast (m, Q m ); Pm <r- p' m ; end 
end 

Forwarding phase: 

received (m, Q) at w from node u do 

if satisfied (C,^) then Q m 4— 0; broadcast (m, 0); return; end 
-Y Qm(w) 4 Q, Qm 4 (1 o)Qm 4“ o(om 4~ rnin v iV Qm (u) ) , 

p m 4— argmin„YQm(n); (random tie break) 
if designated(m) then send(m, Q m ) to p m ', end 
end 

Confirmation phase: 

timeout (m to v) 

NQ m (v) 4— max„ NQ m {n ) 4- 1; 

Pm 4— argmin„YQm(n); (random tie break) 
if (resend) then send(m, Q m ) to p m ; end 

end 



Fig. 3. Adaptive spanning-tree meta-strategy 



3.3 Meta-strategy 3: Adaptive Spanning Tree 

This is a new strategy added to the family, using the same learning core. In cases 
where destinations (e.g., the base station) are known, it is often more efficient to 
build an initial spanning tree from the destination. Typical problems with this 
approach are that the initial tree may be suboptimal due to collisions during tree 
building, and that an optimal tree may become suboptimal over time due to the 
dynamic aspects of the network. Rebuilding a complete tree may also result in 
extra energy consumption and packet loss. 

In our framework, an adaptive spanning tree can be built using the same rein- 
forcement learning core as in the previous two meta-strategies. The initialization 
phase builds an initial spanning tree. The forwarding phase passes the received 
packet to a node’s parent. All the nodes are set to be in promiscuous listening 
mode. Whenever a node hears a packet, whether it is the designated receiver 
or not, it updates its corresponding NQ- value and re-estimates its own Q- value, 
just as in the other two meta-strategies. Similar to real-time search, implicit 
packet confirmation is used: if the packet is not heard from the forwarded node 
within a certain time period, the NQ-value is updated to be the largest among 
the neighbors, and the parent pointer is reset to the neighbor with minimum 
cost. The pseudo code is illustrated in Figure 3. 
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4 Evaluations of Meta-strategies 

We have simulated the three meta-strategies for several real applications using 
Prowler [10], a probabilistic wireless network simulator. Prowler provides a radio 
fading model with packet collisions, static and dynamic asymmetric links, and 
a CSMA MAC layer. 

We use a real application to test the performance of all three meta-strategies 
and also compare them to AODV. The application, Pursuer Evader Game (PEG) 
[8], is to use the sensor network to detect an evader and to inform the pursuer 
about its location. The communication problem in this task is to route packets 
sent out by one of the sensor nodes to the mobile pursuer. The source is changing 
from node to node, following the movement of the evader, and the destination 
is mobile. The network is a 7 x 7 sensor grid with small random offsets. The 
maximum radio range is about 3d, where d is the distance between two neighbor 
nodes in the grid. Let the source be at the middle and the destination be at the 
upper right corner initially. Assume that the evader and the pursuer move at 
about the same speed 0.2 d/s, where d is the grid distance, and the source rate 
be 1 packet per second. The objective of MCBR in this application is simply the 
minimum number of hops. 

The following performance metrics are used for comparing routing strategies 
in this paper: 

— latency - the time delay of a packet from the source to the destination; 

— success rate - the total number of packets received at the destinations vs. 
the total number of packets sent from the source; 

— energy consumption - assuming each transmission consumes an energy unit, 
the total energy consumption is equivalent to the total number of packets 
sent in the network; 

— energy efficiency - the ratio between the number of packets received at the 
destination vs. the total energy consumption in the network; 

Figure 4 shows performance of four meta-strategies: real-time search, con- 
strained flooding, adaptive tree, and AODV. The results are averaged over 10 
random runs. We can see that AODV has the shortest latency, but with low 
success rate, high energy cost, and low efficiency, while constrained flooding has 
the highest success rate and efficiency. Both constrained flooding and real-time 
search have been implemented on the Berkeley mote platform, and tested for 
the real PEG with 49 motes. In practice, constrained flooding also works better. 
AODV is the worst algorithm in this application, as the source changes very 
fast, while learning-based strategies adapt to new situations quickly. 



5 Conclusion and Future Work 

In this paper, we have presented three learning-based meta-strategies for MCBR. 
MCBR enables A-aware routing strategies, where X can be any attribute of the 
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(a) 




(b) 





Fig. 4. Performance Evaluation: (a) Latency, (b) Success rates, (c) Energy consump- 
tion, (d) Energy efficiency 



system, energy, signal strength, connectivity, even sensor readings, or combina- 
tions thereof. The three meta-strategies all use the same reinforcement learning 
core. Even with its minimal overhead (e.g., there are no extra control packets 
needed other than the initialization process), this routing scheme is highly adap- 
tive to the dynamic changes of a network. We have also implemented ant-routing 
for sensor network [16]. A comparison between these two types of learning will 
be presented in the future. We also plan to experiment with different QoS spec- 
ifications, such as connectivity-aware or reliability-aware routing. 
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Abstract. In this paper, we formulate a new problem, namely alloca- 
tion of bandwidth in a two-level hierarchically structured market. In the 
top level a unique seller allocates bandwidth to intermediate providers 
[e.g. Internet Service Providers (ISPs)], who in turn allocate their 
assigned shares of bandwidth to their own customers in the lower level. 
We present an efficient mechanism comprising auctions in both levels. 
We prove that, due to the structure of the mechanism and certain rules 
imposed by the top-level seller, the following dominant strategies apply: 
a) each of the lower-level customers reveals truthfully his demand in the 
auction he participates; b) each intermediary reveals truthfully to the 
top-level seller the aggregate demand in his respective market. Both the 
mechanism and the results extend to the case of more than two market 
levels. 

Keywords: Auctions, bandwidth markets, efficiency. 



1 Introduction 

The evolution of network technology has enabled the development of new com- 
munication services that demand more and more bandwidth in an unpredictable 
way. The high competition and the lack of information about the demand for 
these services motivates the use of auctions to allocate bandwidth to those cus- 
tomers that value it the most. There have been published several studies of 
allocating network resources by means of auctions. Lazar and Semret propose in 
[5] , [6] the progressive second price (PSP) auction for the allocation of a divisible 
resource in a link and a network of arbitrary topology respectively. Maille and 
Tuffin propose in [7] the one-shot multi-bid auction scheme for the allocation of 
a divisible resource in a link; this scheme is related to the PSP auction. They 
extend the multi-bid auction to a special case of a network [8]. Courcoubetis et 
al present in [3] a descending auction mechanism for bandwidth allocation over 
paths, where bids are placed simultaneously and independently in each link. 
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In the general case of a business model, there may exist multiple levels in the 
process of providing bandwidth. Indeed, in the presence of large and distributed 
sets of potential buyers, their direct transaction with the seller is either impos- 
sible, or entails high computational and physical or communication overheads. 
This motivates a hierarchical business model. In this paper, we formulate a new 
resource allocation problem. In particular, we deal with a two-level hierarchical 
business model for selling C units of bandwidth in a single link. In the top level 
the social planner allocates bandwidth to a set of M intermediate providers [e.g. 
Internet Service Providers], who in turn allocate their assigned shares of band- 
width to a set of N(N > M) customers in the lower level. The social planner 
imposes certain allocation and payment rules for both levels. Our objective is the 
efficient overall allocation of the entire supply of bandwidth to the customers, 
as if the social planner were to assign this bandwidth directly. Hence, the use of 
the term social planner, rather than profit-seeking seller. 

Despite its hierarchical structure, the above allocation problem cannot be 
solved efficiently in two independent stages, one for each level. The social plan- 
ner cannot sell the bandwidth to the providers before they acquire knowledge 
of the demand they will face by their customers, while the providers cannot 
trade with their customers before they learn the amount of bandwidth they ob- 
tain from the social planner. Thus, a dynamic mechanism enforcing coordination 
and exchange of information between the two levels is required. In light of this 
necessity, we propose an innovative mechanism, comprising an appropriate auc- 
tion in each level. The auctions are coordinated so that supply is exhausted at 
the end, as required for attaining efficiency. The service providers and the cus- 
tomers are expected to act according to their own incentives, i.e. so that their 
respective benefits are maximized, without being concerned about social welfare. 
Nevertheless, our mechanism is specified so that bidders of both auctions have 
the incentives both to participate uninlribitively and to bid truthfully (incentive 
compatibility property) in an environment where each player only possesses a 
certain part of information on the entire market. In particular, we assume that 
customers’ preferences are privately known to them, providers have no prior in- 
formation about their local markets, while the social planner may only interact 
with providers. Moreover, the rules of the mechanism are announced to all play- 
ers by the social planner but the bidding process outcomes of both levels are 
only released after the end of the procedure. 

2 Problem Formulation — Analysis of a Simple Case 

We assume that the quantity C of bandwidth available in the top level is known 
to all players, and that customers are partitioned into fixed groups, each of which 
constitutes the local market of a certain provider. We denote the local market of 
provider j as the set of customers Sj, for j = 1, • • • , M. Each customer i, receives 
marginal utility 6i } k by his k th allocated unit of bandwidth, for k = 1, • • • , C. We 
assume that marginal utilities of each customer are diminishing and privately 
known to him. A customer knows neither the utility function of other customers, 
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nor the distribution this utility is drawn from, not even the quantity to be 
offered in his local market. In contrast to customers, providers do not know 
their own valuations. Indeed, provider’s j marginal utility Uj t k is assumed to 
equal the revenue he would obtain if he were to sell the fc th unit of bandwidth 
after the trade. Thus, it cannot be determined or predicted without knowledge 
of market demand, which is taken to be completely unknown. The objective 
of the social planner is to maximize social welfare , which measures the overall 
well-being of the society. In our setting, it is given by the formula: SW(a:) = 

(0i,i H l-0i,a: 1 )H b (0 jv,i H b0jv,a: jv)> where x = (aq, • ■ • ,x N ) is the vector 

of the allocated quantities to the N customers. In case of complete information, 
the optimal allocation is determined by ordering the 0i,feS over all customers 
and selecting the largest C ones. It is necessary that we only employ efficient 
mechanisms in both levels. Still, this is not a sufficient condition for efficiency in 
our problem. To illustrate this we will first analyze the case of selling a single unit 
of bandwidth, which provides us with considerable insight and understanding of 
the issues that arise in the general case too. Thus, suppose that the social planner 
wants to sell a single good (i.e., unit of bandwidth) to one of N customers with 
valuations (i.e. utilities) 0,; through M providers. Efficiency is attained if the 
good is ultimately sold to the customer with the highest valuation. Below we 
examine two combinations of well-known mechanisms, only one of which attains 
efficiency under the assumption of privately known valuations. 

First, we deal with the combination of Vickrey auctions in both levels. 
The lower-level auction in each market is performed first, so that the respec- 
tive provider learns his utility. That is, his revenue if he does win the good 
in the top-level auction; this revenue equals the second highest bid among 
the bids placed by his customers. Then, the providers participate to the top- 
level auction according to this utility. All players (in both levels) bid truth- 
fully, due to the Vickrey payment rule, but the outcome may not be efficient. 
That is, the good may not be sold to the provider with whom is associated 
the customer having the overall highest valuation. For example, assume two 
providers A and B having two and three customers respectively, with valuations 
0i = 2, 0 2 = 7, 03 = 5, 04 = 6, 05 = 3. Providers A and B bid 2 and 5 
respectively. Provider B wins the good at price 2 and sells it to customer 4 at 
price 5, which is not the efficient outcome. Even though the auctions in both 
levels are efficient, overall efficiency is not attained because no local market’s 
valuation (i.e., the maximum valuation among market’s customers) is reflected 
in the top-level auction. Each provider derived his own valuation according to 
his revenue, as explained above, which differs from his local market’s valuation. 
The same results hold in case of applying the English auction independently in 
the two levels starting from the lower one: each provider learns his potential 
revenue, which again equals the second highest valuation of his customers. 

Suppose now that we apply the English auction in both levels simultaneously 
as follows: a common price clock ascends continuously in each level. At every 
price, each customer either accepts (e.g. by pressing a button) the offer accord- 
ing to his valuation, or withdraws from the auction. Regarding each provider’s 
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strategy, it is meaningful to accept the offer at the same price if at least one 
of his customers accepts it; otherwise, he withdraws from the top-level auction. 
This auction terminates at the first price where only one of the providers still 
accepts the offer. This is the winner and pays the current price. His market’s 
auction continues until only one of his customers still accepts the offer. This 
customer is the final winner and pays the final price of his local market. The 
winning provider surely ends up with non-negative profits, because his buy price 
is always less than or equal his sell price. The mechanism is efficient, since as 
the price increases only the customer with the highest valuation remains active. 
The provider’s strategy described above is a weakly dominant strategy: if he 
withdraws instead of accepting the offer at a given price where some of his cus- 
tomers are still active, then he ends up with a zero profit; if he accepts instead 
of withdrawing then he may end up with a negative or zero profit. 

It is intuitively clear that a necessary condition for overall efficiency is that 
each provider submits truthfully his market demand in the top level auction. 
Thus, the social planner has to design the whole mechanism so that each provider 
maximizes his profits by transferring his market demand in the top level. A 
requirement for this, is that all players obtain non-negative profits at the end; 
negative profits raise participation issues and generate incentives that result in 
inefficiencies. Since customers know their own valuations and act rationally, there 
is no possibility of obtaining negative profits in any mechanism. But this is not 
the case for the providers, since they participate in two different trades. Thus, 
depending on the mechanism, there may exist a possibility of price inconsistency 
for a provider; i.e. to buy certain units of bandwidth at prices that are higher than 
the corresponding selling prices. Such cases do not arise with our mechanism. 



3 The Hierarchical Auction Mechanism 

We propose a synchronized mechanism in the top and lower levels of our 
hierarchy for selling C indivisible units of bandwidth to N customers through 
M providers. The Ascending Clock Auction with Clinching (ACC) introduced 
by Ausubel in [1] is performed in both levels. In the lower level, however, we 
introduce a new allocation rule, which makes use of the outcome of the top-level 
auction; this rule is discussed below. The price clock is common in the two 
auctions, starts at a reserve price and increases continuously until the end of 
the process at a final price pg na i- Players of both levels bid for quantities at 
every price according to their strategies. We assume that no information about 
his opponents’ bids is available to any player during the bidding process. Each 
customer observes price p and decides whether to submit a new bid at this 
price or not. A pure strategy for customer i is the function Xi : 5R — > K, where 
X, (p) denotes the quantity demanded at price p. Each customer’s bids have to 
be non-increasing as price ascends. Each provider observes his own customers’ 
demanded quantities at price p and uses this information to calculate his bid 
for the top-level auction, as we discuss in Sect. 4. A pure strategy for provider 
j is a function denoting the quantity this provider demands in the top-level 
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auction at price p. For each price p, this depends on the vector of demanded 
quantities in the provider’s local market. In order to simplify notation, we 
denote the bid at price p as Qj{p) and the bidding function as Qj : 3? — > K. 
Again, provider’s j bids have to be non-increasing as price ascends. We will 
determine the dominant strategies of all players in the next section. The proce- 
dure terminates at the first time where demand equals supply in the top-level 
auction. The evolution of the mechanism is given completely by the set of prices 
p l , l = 1, • • • , L, corresponding to the occasions on which one or more players 
(in any of the two levels) strictly decreased his quantity. Next, we will de- 
fine the allocation and payment rules for the two levels, using the notation of [1]. 

TOP LEVEL AUCTION: Let q l j denote the quantity demanded by provider j 
at the I th occasion. Due to clinching in the top-level auction, at each price p l , 
each provider has already guaranteed a quantity for his respective market. The 
quantity Cj clinched up to (and including) price p l by provider j is given by: 

Cj = max {0, C — ^ q l k }, for l = 1, • • • , L and j = 1, • • • , M . (1) 

k^j 

After the bidding process is completed, the social planner announces to each 
provider the quantity he wins and his charge. In particular, each provider obtains 
the quantity he demanded at the final price pg na i and pays for each unit of 
bandwidth the standing price at which he clinched this unit, as suggested by the 
ACC auction. Formally, the outcome of the top-level auction is defined by: 

Allocation of Provider j : q* = C j = qj, for j = 1, • • • , M $ (2) 

L 

Payment of Provider j = ^ p l ■ (Cj — Cj -1 ), for j = 1, • • • , M . (3) 

;= o 

LOWER LEVEL AUCTION: Let x\ denote the quantity demanded by customer 
i at the I th occasion. For the allocation and payment by the customers we intro- 
duce a new clinching rule: At each price p, and at each local market of provider 
j, the condition for determining the quantity to be clinched by the various cus- 
tomers employs the already guaranteed supply Cj in this market. That is, as 
long as a provider clinches new units of bandwidth in the top-level auction, he is 
required to make them available in the lower auction of his own market. This is 
different than performing the original ACC auction for Cj units of bandwidth. 
We denote as B\ the quantity clinched by customer i up to (and including) price 
p‘ , this is given by: 

B\ = max {0, Cj — ^ x l k }, for l = 1, • • • , L and * = 1, • • • , N , (4) 

where Cj is the supply offered at the I th occasion at the customer’s i local 
market j. Each customer wins the quantity demanded at the final price penal of 
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Table 1 . Hierarchical auction in the example 
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the top-level auction and is charged according to two restrictions, which are part 
of the definition of our mechanism: i) each customer pays the standing prices at 
which he clinched the won units of bandwidth as defined in the modified process 
above, ii) no unit of bandwidth is sold at a price higher than pfinai- Formally, 
the outcome of the lower-level auction is defined by: 

Allocation of Customer i : x* = Bf = xf , i = 1, • • • , N , (5) 

L 

Payment of Cust. i = (min (p* ,p fina i}) ■ {B\ - B ( -1 ), i = 1, - • • ,N . (6) 

1—0 

In the example below, we apply the proposed mechanism assuming that each 
customer bids truthfully and each provider bids the aggregate demand of his 
local market observed at each price. In Sect. 4 we will prove that these are 
indeed the dominant strategies for the customers and the providers respectively. 

Example. Suppose there is an amount of C = 8 units of bandwidth that 
is allocated to two service providers A and B that have two customers 
each, with marginal valuations 64 = (10,4,3) [ = (0i,i, 0i,2, # 1 , 3 )], #2 = 
(12,6,2), 63 = (11,9,1), and 64 = (9,8,7) respectively. The efficient outcome 
is given by the vector x* = (x* , x?l , X 3 , x|) = ( 1 , 2 , 2 , 3) and the optimal social 
welfare is SW(x*) = 10 + 12 + 6 + 11 + 9 + 9-1-8-1-7 = 72. We apply the 
proposed procedure letting the price start at price 1. Table 1 presents the bids 
for both levels at the prices of the various occasions. Note that all auctions 
terminate simultaneously and the final allocation is the efficient one. After the 
procedure is terminated, the social planner and the providers calculate their 
allocations and payments, as shown in Table 1. 

4 Derivation of Players’ Strategies and Efficiency 

In this section, we analyze players’ strategies and prove that all the objectives 
set by the social planner are met when the proposed mechanism is applied. 
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Regarding players’ strategies, first note the following: providers wish to maximize 
their profits from participation in two trading markets that interact with each 
other only through their own actions. Their strategy involves a buy process in 
the top level and a sell process in the lower level. Since it is taken that the social 
planner has pre-specified the mechanism in both levels including charging in the 
lower level, the provider’s strategy reduces to the bidding strategy of the top 
level on the basis of the information progressively revealed to him in his own 
local market auction. The key factor determining his optimal bidding strategy 
in the top-level auction is his utility function. If the demand in his local market 
were known and he had the authority to define the payment rule for his own 
customers, he could calculate his utility function completely and bid truthfully 
according to this function in the top-level auction. In our setting neither local 
demand is known nor the payment rule is determined by the provider. Demand 
is derived step-by-step starting from lower prices (higher quantities) to higher 
prices (lower quantities) . At each price the information available to the provider 
is the demand up to this level. On the other hand, each customer takes part in 
the local market auction and wishes to maximize his net benefit according to his 
known utility function. 

We henceforth restrict attention to: i) the following strategy of providers: 
bid the quantity indicated by local demand if not exceeding the capacity C, 
otherwise bid C; and ii) the following strategy of customers: bid the quantity 
indicated by his utility function at each price p. Recalling that Qj(p) and X, : (p) 
denote the bidding strategies of providers and customers respectively, we have: 

{ maxjfc : 0 i>fe > p}, if 6^ 1 > p 

( 7 ) 

0, otherwise 

We claim that provider j maximizes his profit by adopting Qj and customer i 
maximizes his net benefit by adopting Xj, regardless of other players’ strategies. 

Proposition 1. Demand revelation by every provider constitutes a weakly dom- 
inant strategy. 

Proof. Suppose that all providers but j and customers bid according to arbitrary 
but fixed strategies. We will prove that provider j maximizes his profit by reveal- 
ing his local demand. Suppose that at a given price, provider j bids a quantity 
less than this demand. Since the other providers and provider’s j customers do 
no deviate from their strategies, the procedure will remain the same except for 
termination, that will be reached earlier, i.e., at a lower price. Indeed, provider 
j will win less or the same number of bandwidth units than he would if he had 
revealed his demand. Due to the second restriction of the payment rule in the 
lower level, all units will be sold at most at this price. This implies that: 

1. Provider’s j buy price for all bandwidth units won is the same compared to 
the case where his bid equals his demand at each price — > no extra profit. 

2. His sell price for bandwidth units sold up to the new market-clearing price 
is the same — > no extra profit from these units. 
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3. His sell price for bandwidth units that would be sold at higher prices is now 
the new market-clearing price, which is lower than the original one —1 loss. 

4. He will probably not win some units of bandwidth that could be sold with 
non-negative profit possible loss. 

Conversely, suppose that at a certain price provider j bids a quantity higher 
than his demand. Termination of the process will be delayed which implies that: 

1. Provider j will probably obtain more units that he will not be able to sell 
-A possible loss. 

2. His buy price for units besides the extra ones does not change since the other 
providers do not change their bids — > no extra profit. 

3. His sell price for units besides the extra ones will not be sold in higher prices 
since his customers do not change their bidding — > no extra profit. 

Thus, provider j should bid his local demand, regardless of the other players’ 
strategies. □ 

Proposition 2. Truthful bidding by every customer constitutes a weakly domi- 
nant strategy. 

Proof. Suppose that all providers bid according to an arbitrary but fixed strat- 
egy. We will prove that every customer maximizes his net benefit by bidding 
truthfully. Indeed, the number of a customer’s clinched units of bandwidth at 
each price is independent of his bids. The units of bandwidth and the prices at 
which he gains them depend on the bids of his competitors in the same market 
and on local supply, which is derived by the other providers’ bids. In other words, 
if, at a certain price, customer i reports a higher quantity than the true one, 
then he might win an extra unit at a price, that is higher than his corresponding 
marginal value, thus resulting in a loss. Conversely, if customer i reports a lower 
quantity than the true one, then he faces a loss from not winning an extra unit 
with positive net benefit, without achieving a lower price for any of his other 
units won. Thus, customer i bids truthfully. □ 

Corollary 1. If each provider’s strategy is demand revelation, then all auctions 
terminate simultaneously when demand equals supply in the top-level auction. 

Outline of proof. Let pfi„ a i be the price at which demand equals supply in the 
top-level auction. All lower-level auctions are terminated no later than price 
Pfinab since local market demand equals the respective local market supply at 
4* final Additionally, no lower-level auction is terminated at a lower price than 
Pflnai, since each provider sells at least one unit at price pa na i for otherwise the 
whole procedure would have terminated sooner than pn na \. □ 

Remark. It is the second restriction of the payment rule in the lower level (namely, 
no unit is sold at a price higher than the final price) that renders demand reve- 
lation a weakly dominant strategy. Had it been omitted, providers would shade 
demand to their benefit: Lower level auctions would terminate at higher prices 
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(not simultaneously) yielding more revenues per won unit, while the top-level 
auction would terminate at a lower price yielding smaller charges. 

In the sequel, we explain why efficiency is attained. The selected mechanisms 
in both levels lead to efficient outcomes if considered independently. This is 
not enough, as discussed in Sect. 2, for achieving the overall efficiency in our 
hierarchical allocation problem. Revelation of local demand for every provider 
is a necessary condition for the mechanism to be efficient. This implies that 
provider’s j marginal utility (revenues for the k th additional unit) should equal 
the local market’s marginal utility Vj^. Our mechanism satisfies the condition 
that the final price (which is the maximum price at which a won unit can be 
sold) be equal to each provider’s marginal utility for the last demanded unit 
of bandwidth. Indeed, since the providers reveal demand, each of them will sell 
at least one unit at the final price pfi na i (see proof of Corollary 1). Thus, the 
final price equals each provider’s revenues for the last unit of bandwidth, that 
is his marginal utility. The provider need not reveal all the demand; a part of 
it starting from low prices up to the final one is enough to achieve efficiency. 
Our mechanism has the important property that the social welfare attained in 
a market with intermediaries is the same as that under direct allocation by 
the social planner. More interesting is the fact that customers’ net benefits are 
identical in both cases. However, the social planner faces a loss that is conveyed 
as profit to the providers. This is proved in the next proposition: 

Proposition 3. Allocation and payments for each customer are the same either 
the trade is performed directly and efficiently by the social planner or hierarchi- 
cally by means of our mechanism. 

Proof. Assume first that the social planner allocates C units of bandwidth di- 
rectly to the customers. Let p\ be the price at which customer 1 clinches his first 
unit of bandwidth. Then pi is the smallest value that satisfies the condition: 

N 

C-'^2qi(pi) = l<&C-'^2q i (p 1 ) + q 1 (pi) = l , (8) 

i^l i= 1 

where qfpi) is the bid of customer i at price p\ . i = 1, • • • , N . We will prove 
that at the hierarchical auction we propose, customer 1 clinches his first unit 
of bandwidth at price p\ too. Suppose there are two providers A and B , and 
customer 1 belongs to the set Sa of customers of provider A. Let Qa{p),Qb{p) 
be the bids of providers A and B respectively and oa(p) the quantity already 
clinched by provider A at price p. Let p\ be the price at which customer 1 now 
clinches his first unit of bandwidth. Then p\ is the smallest value that satisfies 
the condition: 

Mp'i) - ®(p'i) = 1 ^ a A(Pi) ~ Qa(p[) + qi(p[) = 1 • (9) 

Clearly, cla(p) = C — Qb(p ) for every price p. Combining this with (9), we obtain 

N 

c - Qb(p!) - Qa(p[) + qi(p'i) = 1 ^ c - ^2 Qiip'i) + <h(Pi) = 1 • ( 10 ) 

i = 1 
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Combining this with (8) we obtain pi = ffi. Similarly, it is easily seen that all 
the remaining units clinched by customer 1 in a direct ACC are clinched at the 
same price in the hierarchical auction, thus giving rise to the same allocation 
and payments for the customer in the two cases. □ 

Corollary 2. The hierarchical auction mechanism yields the efficient outcome. 

Proof. Allocation of bandwidth to customers under our mechanism is the same 
with that under ACC, which is efficient. Hence, efficiency of our mechanism fol- 
lows. □ 

5 Concluding Remarks 

In this article, we focus on strategic interactions among sellers, retailers and po- 
tential buyers. We propose and analyze a new auction mechanism to sell band- 
width in a hierarchically structured market efficiently. We take advantage of the 
distribution of information over the parts involved and coordinate the various 
trades that have to take place, such that no one has the incentive to deviate from 
bidding truthfully. The aforementioned mechanism can be applied in other cases 
too, such the hierarchical trading of units of other services (e.g. call minutes), or 
the trading of bandwidth of an overprovisioned backbone link (top-level trade) 
that interconnects with uncongested access networks (lower-level trade), consid- 
ered by Maille and Tuffin in [8]. Regarding implementation, recall that according 
to the presentation of the mechanism in Sect. 3, the auctions of both levels are 
performed simultaneously. However, the auctions of the lower level can alterna- 
tively be performed asynchronously and prior to the top-level auction, until the 
total demand equals the total available bandwidth C. This implementation is 
simpler and more appropriate for practical cases. 
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Abstract. Pricing is considered a relevant way to control congestion 
and differentiate services in communication networks. Among all pricing 
schemes, auctioning for bandwidth has received a lot of attention. We 
aim in this paper at comparing a recently designed auction scheme called 
multi-bid auction with the often referenced progressive second price auc- 
tion. We especially focus on the case of a stochastic environment, with 
players/users entering and leaving the game. We illustrate the gain that 
can be obtained with multi-bids, in terms of complexity, revenue and 
social welfare in both transient and steady-state regime. 



1 Introduction 

To cope with congestion in communication networks, it has been proposed to 
switch from current flat-rate pricing to usage-based or congestion-based pricing 
schemes (see for instance [3,4] for surveys on pricing in telecommunication net- 
works, describing the range of possibilities; for the sake of conciseness, we do 
not describe all the schemes here). Among those pricing schemes, auctioning has 
appeared as a possibility to share bandwidth. The first time auctioning was pro- 
posed was in the seminal smart-market scheme of MacKie-Mason and Varian [6] , 
where each packet contains a bid and, if served, pays the highest bid of the pack- 
ets which are denied service. This scheme requires a high engineering cost, but 
has pioneered the auction-based pricing activity in the networking community. 

Progressive Second Price (PSP) Auction [5,11,12] has recently been pro- 
posed as a trade-off between engineering feasability and economic efficiency. In 
PSP, players submit bids at different epochs, each bid consisting of the required 
amount of bandwidth and the associated unit-price, until a (Nash) equilibrium 
is reached. The scheme has been proved to be incentive compatible and effi- 
cient. Variants of PSP have been designed in [8,14] in order to fix some of its 
drawbacks. 

In [9], multi-bid auction (a one-shot version of PSP) has been proposed. It 
consists for each player in submitting multiple bids once only, providing there- 
fore an approximation of her own valuation function. Market clearing price and 
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allocation can be subsequently computed. Here again, incentive compatibility 
and efficiency are proved (up to a given constant). The scheme presents the ad- 
vantage, with respect to PSP, that no bid profile diffusion is necessary along the 
network, and that there is no convergence phase up to equilibrium, then yielding 
a gain in engineering and economic efficiency, especially when players enter and 
leave the game randomly. 

The goal of this paper is to numerically highlight and illustrate the gain 
that can be obtained by multi-bids over PSP. We place ourselves in a stochastic 
environment, with users of different types entering and leaving the game at 
random times, and investigate the transient (for a given trajectory) behavior 
and steady-state performance of both multi-bids and PSP. We especially focus 
on three criteria: network revenue, social welfare and computational complexity. 

Note finally that there exist other auction schemes in the literature [2,10, 
13], but due to space limitation, and since our main purpose was to emphasize 
the degree of improvement when using the multi-bids instead of PSP, we do not 
include them here. 

The layout of the paper is as follows. In Section 2, we present the stochastic 
model that will be used to describe the system behavior. In Section 3 we present 
the PSP mechanism and its properties; the same is done for the multi-bid scheme 
in Section 4. Section 5 illustrates the gain that can be obtained by using the 
multi-bid scheme in a stochastic environment; both transient and steady-state 
results are provided. Finally, we conclude in Section 6. 

2 General Model 

In order to look at the auction schemes’ behavior in a stochastic environment, 
we model for convenience the system by a Markov process. 

Consider a single communication link of capacity Q. We assume that there 
exists a finite number T of different valuation (or willingness-to-pay) functions, 
corresponding for instance to different sets of applications. A player/user i is 
then characterized by her type ti £ {1, ...,T}. 

Players compete for bandwidth. To model their behavior, we represent their 
perception /valuation of the service they can get by a quasi-linear utility function 
of the form 



Ui(s) = 9 ti {ai(s )) - a(s), ( 1 ) 

where 6 ti is the valuation function of a type-tj player, which depends on the 
quantity of resource received a, . Quantity c, is for the total cost charged to 
player i. Both Cj and a,; will depend on the auction scheme used and on the 
whole set of bids s (where the term “bid” will depend on the auction scheme) . 

We assume that new players enter the game according to a Poisson process 
with rate A, and that the type of a new player is chosen according to a discrete 
probability distribution P t , so that the arrival rate of type u is X u = \P t (u). We 
also assume that each type-u player sojourn time is exponentially distributed 
with rate ji u (independent here of the obtained accumulated bandwidth, like for 
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real-time applications). Let T{t) be the set of active players at time r (i.e. the 
set of players present in the game at this time) and I(t) be the total number of 
players at time r. 

To ensure that the bandwidth is not sold at a too low level, the seller can thus 
be seen as a (permanent) player, noted by 0, with valuation function 9o(q) = Poq- 
Po , the reserve price, guarantees that no bandwidth will be sold at a unit price 
under po- 

Our goal is to compare the behavior of PSP and multi-bids. Let us now recall 
the basic concepts of both schemes. 



3 Progressive Second Price Auction [5,11] 

In PSP, a player i submits a 2-dimensional bid Sj = ( qi,Pi ) € Si = [0,Q] x 
[0, Too), where qt is the desired quantity of resource and pi the unit price player 
i is willing to pay for that resource, s = (si, ..., sj) will denote the bid profile, and 
s-i = (s I,--, Si-i, Sj+i, ..., Si) will be the bid profile that player i faces, so that 
s = ( Si ; S-i ) (the dependence on time r is omitted to simplify the notations). 
PSP allocation and charge to player i are 



a, : (s) 



c,;(s) 



Qi A Q - ^2 9k | 

Pk>Pi,k 

'/’ ] Pj [ a j( s -i) — a j( s it s -i 



(2) 

(3) 



so that players bidding the highest get the bandwidth they request and total 
charge corresponds to declared willingness to pay of players who are excluded 
by player V s bid. 

Each time a player submits a bid, she tries to maximize her utility, and a bid 
fee e is charged to her. Under some concavity and regularity assumptions over 
functions 9 U , when the number of players is fixed and players bid sequentially, 
the game is proved to converge to a so-called e-Naslr equilibrium, so that no 
player can improve unilaterally her utility by more than e. The scheme is 
also proved to be incentive compatible (meaning that users’ best interest is to 
truly reveal their willingness to pay), and efficient in the sense that the social 
welfare X)i e x(r)u{o} ^( a *) is asymptotically maximized (when the algorithm 
has converged). 

Based on the assumption that users enter or leave the game, efficiency might 
become an issue. We suppose here that each type-u player in the game has 
the opportunity to submit a new bid at different times. Inter-bid times are 
assumed to follow an exponential distribution with parameter u u , independent 
of all other random variables. When a new player arrives, she is assumed to 
submit an optimal bid (meaning that she knows the bid profile). 
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4 Multi-bid Auction [9] 

In the multi-bid scheme, users, when they enter the game, submit a set of M 
2-dimensional bids s* = (si, ..., sf 1 }, where for all m, 1 < m < M, s™ = (g”\ p™) 
is as in PSP (the seller just submits one 2-dimensional bid so = (qo,Po) with 
qo > Q and p 0 the reserve price) . We assume without loss of generality that bids 
are sorted such that p] < pf < ... < pf . With respect to PSP, the bids are 
submitted just once, so that users do not submit new bids at given epochs. This 
reduces the signaling overhead. 

From the multi-bids of all competing players at time r, the so-called pseudo- 
demand function of user i can be computed as the function di : R + —> R + , 
defined by 



f 0 if pf 1 < p 

d%(p) — < max { q ™ : p™ > p} otherwise. (^) 

1^1 < m < M 

The pseudo-aggregated demand function is the function <1 : K+ —>■ M + defined 
by d(p) = E,: e i(r)u{o} di(p), where d 0 (p) = q 0 t p < po (apply (4) for M = 1) . 

From the pseudo-aggregated demand function, we define the pseudo-market 
clearing price u by 



u = sup {p : d{p) > Q} . (5) 

Such a u always exists since cJ(0) > J o (0) = 9o > Q- Moreover for p > 
max.j g x(T)u{o}(pf f ) we have d(p) = 0, and therefore u < + 00 . 

Describe now the allocation and pricing rules. First define, for every function 
/:R->K and all x G R, f(x+) = lim z ^ XiZ>x f(z). 

The allocation, recomputed each time a player enters or leaves the game, is 

. . — , 1 . dAu) — dAu + ) , „ , ,, 

aAsi, s-i ) = dA'U + ) + -^ 7 — jt — M Q - d(u + )), (6) 

d(u) — d{u + ) 

meaning that each player receives the quantity she asks at the lowest price u + 
for which supply excesses pseudo-demand, di(u + ), and the excess of resource is 
shared among players who submitted a bid with price u. 

The total charge is computed according to the second-price principle [1,15] 
(but using the pseudo-demand functions instead of the real ones): 



c,;(s i ,s_ i )= / 0' t .(q)dq, 

jex(T)u{o},j& Ja iW 

with 0' tj pseudo-marginal valuation function of j, defined by 

f 0 if q] < q 

df.j (p) = max { p ™ : g™ — 9} otherwise. 
1 1 < m < m J J 



(7) 



(8) 
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As for PSP, incentive compatibility (each user i should better reveal its band- 
width valuation, i.e., p™ = 0' t (q™) Vm), and efficiency are proved, but up to a 
controlled constant here (see [9] for details). 

It is shown in [9] that it is in the players’ interest to submit a uniform quantile 
repartition of their bids, i.e., (gfbp™ = (</•”)) VI < to < M such that 

/ (O'tXq) — p™)dq = Ci V?n, where < q_ (9) 

Jdi(p™ +1 ) 1 {Pi—Po- 

5 Comparison of Performance 

Multi-bids present the following advantages with respect to PSP: 

— since the bids are submitted exactly once, no convergence phase is required 
by resubmitting new bids until an equilibrium is reached. It might be argued 
that the mean number of re-submission up to equilibrium is less that the 
number M of multi-bids in some cases; this situation is less likely to occur 
in the situation of customers arriving or leaving the game, meaning that a 
new re-submission phase is required for each player in PSP, whereas nothing 
has to be done for already present players when using multi-bids. 

— Following the same idea, when submitting a new bid in PSP, each player is 
assumed to know the bid profile, meaning that it is advertised to all players. 
This is not required for the multi-bid scheme, saving then a lot of signaling 
overhead. 

We propose to illustrate the above advantages of multi-bids in the follow- 
ing sub-sections. We especially wish to show that this gain in terms of signal- 
ing/complexity is not at the expense of efficiency, in terms of seller’s revenue or 
social welfare, both on a trajectory and during the convergence phase of PSP, 
as well as in steady state, and that it even actually is the converse. 



5.1 Transient Analysis 

Figure 1 displays the behavior of PSP and multi-bids when the number of play- 
ers is fixed and until equilibrium is reached for PSP, with two types of players, 
three type-1 and two type-2 players. The upper left-hand side figure displays 
the valuation and marginal valuation functions for both types of players, that 
we used in all our simulations. The lower left-hand side represents social welfare 
SiGi(r)u{o} Since the number of players is fixed during the simulation, 

multi-bid allocations and charges are fixed, and it can be observed that the social 
welfare is 60.37, very close to the optimal one 60.71. On the other hand, for PSP 
auctions the social welfare changes at each re-submission from a user (result- 
ing in the discontinuous curve), reaching equilibrium (with value 60.68) around 
time r = 26, but showing a lower social welfare than multi-bids before reach- 
ing equilibrium. The lower right-hand side of Figure 1 represents the network 
revenue for both schemes. Again, multi-bid revenue is constant due to the fact 




Multi-bid Versus Progressive Second Price Auctions 



323 



e / 1 (q)=5-0.5q ; 0^=1 0-2q 




Social Welfare, M=3, v.^5, v 2 =5 



65 




60 


•_ :: 




. • • -r 




d) bb 


••• . 


m 


CO 


/ . * •< . 




® 50 


» i*.W- — 




g 






.2? 45 






o 






U1 40 










{ 


Multi-bids 




35 




• PSP 








Maximum 




30 




1 



0 10 20 30 40 

Time 



Number of players 




Network revenue, Q=10, p Q =0. 1 , e=0.1 




Fig. 1 . Comparison of PSP and multi-bids for a fixed number of players, until conver- 
gence is reached for PSP 



that the number of players is fixed. Also, the revenue for PSP is first increasing, 
overtaking the one with multi-bids after a while, but then dropping under it 
just before reaching equilibrium. Actually, we proved in [7] that when the total 
demand at the unit price po exceeds the available capacity Q, the revenue with 
PSP in equilibrium tends to po x Q (i.e. all the resource is sold at the reserve 
price) when the bid fee e tends to 0. 

Figure 2 illustrates the behavior of both schemes on a trajectory, with players 
entering and leaving the game 1 . Here the number of players of each type varies, as 
described on the upper right-hand figure. The curves of social welfare show that, 
when using multi-bids, the resulting social welfare is always very close to the 
optimal one, whereas, due to the convergence phase, there is a loss of efficiency 
when using PSP. Similarly, on this trajectory, the network revenue generated by 
multi-bids is significantly larger than the one generated by PSP. 



1 The parameters we chose are precised in the figure, and were also used for the study 
of steady-state performance. 
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Fig. 2. Behavior of PSP and multi-bids on a trajectory, with players entering and 
leaving the game 



5.2 Steady-State Analysis 

Figure 3 illustrates the evolution of the mean efficiency ratio (obtained steady- 
state social welfare divided by the optimal one), the mean network revenue and 
the complexity of the algorithm when the number M of multiple bids increases, 
and compares those performance measures with the ones obtained for PSP. The 
complexity of computing PSP allocations and prices is of the order 0(I 2 ) [11], 
and the complexity of multi-bid auction is of the order 0(M x I 2 ) [9]. We there- 
fore display the mean number of applications of each auction rule by unit of 
time, multiplying this number by M for multi-bid auction. This curve does not 
precisely give the number of elementary operations that are conduced, but just 
gives an idea of how the computational complexity evolves when the parameters 
vary. Note that this computation of complexity does not include the signaling 
overhead necessary for PSP. It can be observed then that for small values of M, 
computational complexity is even smaller also with multi-bids. More important, 
thanks to the one-shot property of multi-bids (i.e. , the fact that no convergence 
phase is required unlike PSP), steady-state social welfare (for M > 2) and rev- 
enue are larger with multi-bids. 
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Fig. 3. Steady-state performance of multi-bids for an increasing number M of allowed 
two-dimensional bids in multi-bid auctions, compared with PSP 



Figure 4 displays the evolution of efficiency ratio, network revenue and com- 
plexity when the arrival rate increases (with all other parameters fixed). 

Again, multi-bids are shown to provide better performance. The difference 
increases with A. This is due to the fact that the number of players varies more 
frequently, so that convergence to optimal values is less likely to occur for PSP, 
whereas it does not affect multi-bids. 

Figure 5 illustrates the three criteria considered in this paper, when the bid- 
resubmission rate varies for both types of players. Even when this rate increases, 
leading to a larger computational complexity, we see that the multi-bid auction 
still outperforms PSP as concerns efficiency and network revenue. 



6 Conclusions 

The goal of this paper was to compare PSP and multi-bid schemes, two auction 
mechanisms for bandwidth allocation in telecommunication networks. Based on 
this purpose, we have considered a model representing a communication link, 
with players applying for connections at random epochs, and for a random time. 
Our conclusion is that multi-bid auction scheme significantly reduces the signal- 
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Fig. 4. Steady-state performance comparison when the charge increases 




Fig. 5. Steady-state performance comparison when the frequency of re-submission in- 
creases in PSP 
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ing overhead of PSP, but also yields larger social welfare and network revenue 
(at least for this stochastic context regarding the social welfare). 

As future work, we plan to extend the multi-bid auction scheme to a whole 
network. We already have some results in the case of a tree network, which 
properly represents the case where the backbone network is overprovisionned 
and the access networks have a tree structure. 
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Abstract. In this paper, we propose a Distributed Multi- link Auction 
mechanism (refereed by DiMA) that deals with request connection 
establishment in order to provide some End-to-end guarantee of 
services. The DiMA mechanism determines hop-by-hop the path to 
be taken by a request while reserving the required resource over it. 
It consists of consecutive local auctions that each request has to 
win in order to be satisfied. We consider the problem of determining 
requests global budgets and bidding strategies. We give some simulative 
analysis mapping the relation between prices, network utilization and 
distribution of accepted requests. 

Keywords: Packet network, Quality of Services, distributed auction 
strategies, pricing. 



1 Introduction 

Today’s Internet resources are equally distributed making the only one possible 
service be a best effort service. With the growth of Internet, several traffic flows 
belonging to new sophisticated applications have some specific requirements and 
need some guarantees of Quality of Service (QoS). These services can pay to get 
a level of QoS when best effort still guarantees connectivity to classical traffic. 
The question that arises is how to price the different proposed QoS levels in an 
efficient way. 

Some pricing models have been proposed in the context of QoS-enabled ar- 
chitectures including ATM, MPLS and QoS over IP (Integrated services, RSVP). 
In such models, resources are negotiated and priced at a connection setup time- 
scale in order to meet QoS requirements. Those pricing models can be classified 
according to the overlay architecture: ATM [12,1,14,13], MPLS [3], Intserv [16, 
8]. However, some mechanisms are not specific to a particular architecture and 
can be applied to any environment where resources are allocated in advance to 
guarantee QoS requirements. They can be identified by three main concepts: 
adjustment schemes, effective bandwidth and auctions. 

Pricing Models that use adjustment schemes [5,12] [19,9] come with different 
flavors but mainly consist of the same concept; updating prices until an equilib- 
rium point is reached. The objective function to be optimized is the social welfare 



J. Sole-Pareta et al. (Eds.): QofIS 2004, LNCS 3266, pp. 328-337, 2004. 
(c) Springer- Verlag Berlin Heidelberg 2004 




Distributed Multi-link Auctions for Network Resource Reservation 



329 



(a combination of network benefit and user surplus) under some capacity con- 
straints (bandwidth and/or buffer) that guarantee the QoS level. The problem 
is decomposed into a network problem of updating prices and a user problem of 
choosing adequate resource demand given its price. In [9] , an arbitrager layer is 
introduced to remove direct consideration of network resources while focusing in 
the QoS parameters. Indeed, the arbitrager negotiates with users optimal amount 
of loss probability for instance, and then purchases the needed bandwidth from 
the network. In [6] an intelligent agent is introduced in order to replace the user 
in choosing the willingness to pay based on user learned preferences. 

An important concept to support QoS over IP is effective Bandwidth. It 
corresponds to the notion of equivalent bandwidth in ATM. It maps the QoS 
required to satisfy a request into an amount of bandwidth that can be used 
by the admission control rule. In [10], a simple linear tariff which is tangent to 
the bandwidth evolution curve is proposed. This is an incentive to the users to 
declare (the more precisely) their real requirements. 

Incentive compatibility is an important issue about pricing. Indeed, to be 
efficient , a pricing model should include some incentives to users to declare their 
real valuations of services. Auctions have been studied [4,11] as an interesting 
alternative to pricing network resources since they deal with this issue. 

In [7], we have introduced the Distributed Multi-link Auction mecha- 
nism(referred by DiMA) that deals with request connection establishment by 
reserving hop by hop the required resource on a possible path. We have studied 
the complexity of considering more than one possible path for each request. We 
have proved that this problem is NP-lrard and proposed a heuristic approach to 
solve the problem. When in [7] we were interested in complexity and theoretical 
properties of the model, here we address some implementation and technical 
issues about connection establishment and bidding strategies. 

Layout: We first describe the communication model we have adopted and re- 
mind the concept of DiMA. We address the problem of bidding strategies then 
we provide some simulation results highlighting how the DiMA mechanism can 
be deployed to allocate network resources efficiently. 

2 Communication Model 

2.1 Managing Quality of Services 

Many architectures have been proposed to provide QoS support while saving 
scalability. 

Integrated services have been considered as an alternative to provide some 
QoS guarantees. Resources are reserved to real time applications before the data 
transfer by using a signaling protocol(RSVP). Statistical multiplexing can be 
deployed for aggregated flows. However each router needs to reserve resources 
for each flow. Hence a flow-state must be maintained at each crossed router. 
This addresses a problem of scalability in addition to important requirements in 
routers [20]. 
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With Diffserv [15,20] some differentiation can be given to services in scalable 
way without any resource reservation but then it is impossible to give any de- 
terministic guarantees on the service. Intserv architecture can be implemented 
at lower levels (access network) to manage individual flows. Diffserv can be then 
implemented on the backbone by adding flow admission control capabilities to 
the edge nodes of a statistically provisioned Diffserv network region [2]. 

A second possibility [21] is to use a Bandwidth Broker(BB) that has sufficient 
knowledge of resource availability and network topology. BB can then make 
Admission Control and resource reservation to aggregated flows. Core routers are 
then freed from performing admission control decisions. However, in this model, 
BB needs to manage the overall network domain and to store informations about 
flows and paths. This addresses a problem of scalability. 

This can be removed as in [17] by distributing the functions assigned to the 
BB between all nodes in the network domain in a scalable way. Although nodes 
need to maintain state information for every reservation, author claim that avail- 
able routers have sufficient memory. Indeed aggregation and thus classification 
and scheduling is made on a class basis. In addition, a label switching mecha- 
nism can be deployed to simplify forwarding packets. The proposed architecture 
includes per-flow signaling with enhanced scalability and resource reservation 
for aggregated flows at both core and access nodes. 

We adopt this approach for our model. Indeed, we suppose that the traffic is 
aggregated to some defined class of traffic with specific requirements. Some ef- 
fective bandwidth techniques can be used to determinate amount of bandwidth 
which is necessary to meet the QoS requirements. Hence each request corre- 
sponds to an inelastic demand and requires then a fixed amount of bandwidth. 

2.2 Connection Establishment 

The DiMA consists in establishing connections for traffic classes on demand by 
reserving required resources on appropriated paths before data transmission. 
Such demand-driven connections requires an on-line treatment. Moreover, due 
to delay and control constraints, a centralized process seems to bee unrealistic. 
Hence, in our model request acceptance and routing decisions are decentralized 
and made by the crossed routers. 

The model can be mainly described as follows: Each node of the network 
corresponds to a router and can also be the origin and/or the end of a connection. 
For instance, consider that a node u wants to establish a connection with a node 
v in the network. The node u emits a Connection Demand (CoD), i.e., a packet 
containing the destination address and the required capacity. Each router that 
the CoD reaches, by using its local routing function, determines the outgoing 
link(s) that the CoD could take to the next router with respect to its destination. 

At each time step, each router performs the following steps: 
o Collecting CoD that arrive. 

o Selecting CoD to be accepted by satisfying their demand while respecting 
links capacity constraints and local routing tables. 
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o Storing accepted requests and sending them to the corresponding outgoing 
links. 

o Updating the amount of available capacity on each outgoing link, 
o Destroying the non accepted CoD. 

o Sending on each corresponding incoming link a Destruction of Connection 
(DoC) packet. 

Thus, at each time step, the first operation realized by each router is to deal 
with the DoC it receives by: 

o Liberating the allocated capacity on the outgoing link used by the corre- 
sponding connection. 

o Sending back the DoC to the incoming link of the connection. 

When a communication is end-to-end accepted, it remains reserved for the two 
communicating nodes until one of them cuts the communication by sending a 
DoC instruction to the other node. 

2.3 The DiMA Auctions 

In our model, auctions are used to select requests to be accepted while holding 
capacity constraints. As explained before, the request acceptance and routing 
decisions are distributed over the involved routers. Hence only local informations 
can be considered to make such decisions. Moreover, the path assigned to a 
request (if accepted) is discovered hop by hop based on local routing tables and 
subject to capacity constraints and auction results. 

Each request has a global budget that it is willing to pay to get the desired 
resource on the entire route to its destination. A partial budget (by link) is 
derived (see 3.2) to participate at local auctions. In a distributed management 
setting, only such format is feasible to deal with consecutive auctions. Each 
request has to win all local auctions on its route to be accepted. Auctions take 
place at a fast time-scale considering only new requests since bandwidth assigned 
to already accepted requests is not renegotiated. 

We have proposed in [7] a heuristic that solves the auction problem at a router 
since the problem is NP-Hard in the general case. This heuristic can solve the 
problem exactly for linear and tree topologies where only one path is possible to 
each destination. It consists of three points: 

o Sort requests that are present at this step in a decreasing order of their 
unitary budgets i.e (partial budget/ demand), 
o Accept requests in this order on one corresponding output link while keeping 
capacity constraints held. 1 

o Accepted requests on an output link pay for a unit of bandwidth the first 
non accepted unitary bid on that link. 

1 Among the possible output links, we can choose either the first one with available 
capacity (first fit) or the one with the highest available capacity (best fit). 
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For a request, a corresponding output link is the following link on a shortest path 
to the destination 2 . In the auction algorithm, we choose the output link with 
the highest capacity among the corresponding ones (best fit heuristic). Accepted 
requests are then routed to the following corresponding router. 

The auction rule adopted here consists on a second price rule adapted to 
inelastic demands. The incentive compatibility is still saved. Indeed, the obtained 
unitary price of the resource is the minimum such that accepted requests are 
insured that the refused ones could not bid higher. If a request is refused then 
the bandwidth that was previously reserved to it is freed. It can be allocated by 
requests that arrive later. If a request win all auctions on the determined path 
then it is accepted and the required resource is reserved during the connection 
duration. It pays for each time unit of its duration, the sum of all prices that 
results of local auctions on its path. Due to the temporal dimension of the 
problem, this tariff is efficient if all requests durations are in the same range 
(see 4.1). Otherwise, a multi-session auction problem can be considered where 
at each time interval auctions are rerun. This problem has been studied in [18]. 



3 Bidding Strategies 

3.1 Global Budgets 

Global budgets can be function of demand, duration and distance between the 
origin and the destination of the corresponding request. In a mono network 
manager context, unitary budget can represent a kind of priority that can be 
given to a traffic class. Let us consider unitary budgets given by the following 
functions: 

Functionl: An increasing function of demand. For instance 
bid = demand. A such function gives priority to big requests. 

Function2: bid = 1. This function does not give any priority since unitary bids 
are the same. Hence, the auction mechanism is not involved. 

Function3: A decreasing function of demand. For instance 

bid = \log{ 10 * demand) / demand] . It gives priority to small requests. 

Example 1. Consider a request rq with d l = 1 and a request r 2 with d 2 = 7. Let 
b l be the unitary budget calculated, then with: 

— Functionl: b 1 = 1 and b 2 = 7 priority is then given to request r 2 . 

— Function3: b 1 = 3 and b 2 = 1 priority is then given to request rq. 

We have tested those budget functions to investigate whether the priority level 
is respected. We provide some numerical results in section 4. 

2 Non shortest path can be considered but then some rules should be required to 
ensure routing stability. 
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3.2 Bid Distribution 

The multi-link auction problem is closely related to the multi-session problem. 
In [18], authors discuss this relation and outline some issues that concern both 
dimensions particularly bid distribution. They present the Markovian property 
of such auctions and thus derive some Markovian bidding strategies. Here, a 
similar Markovian property can be outlined. That is, at a router, a request 
do not need to know the complete history about previous auctions. Then the 
decision of the amount of budget to bid depends only in the remaining steps 
(remaining distance to the destination). Indeed, in the proof given in [18], we 
only need an inelastic valuation metric on successful connections. Since a request 
is only interested by winning all link auctions, we can substitute links to periods 
and then we obtain the same result. 

Hence, We adopt a proportional Markovian strategy that consists in bidding 
at each step the remaining budget on the remaining distance. One point of 
interest is that we use a second price rule. That is request does not have to pay 
its offered bid but rather the first non accepted bid. The difference is added to 
the remaining budget giving more chances to the request to be accepted in the 
following auctions. 



4 Experimental Studies 

4.1 Scenariol 

Due to the temporal dimension of the problem and the on-line approach adopted 
here, some high bidders could be denied from getting the resource because it has 
already been allocated to some previous low bidders. The question that arises 
is how often such configurations could happen. One could expect that these 
precedence constraints induced by the on-line approach attempt to the efficiency 
of the mechanism, we propose a simple scenario to investigate this issue with 
the assumption that connection durations are in the same range. 

We consider a linear topology of 9 links with fixed capacity equal to 500 unit 
of bandwidth. A set of new requests is generated each 10 time slots between 
the first and the last node. Each requesting 1 unit of bandwidth for along a 
mean duration of 60 time slots. We consider two traffic classes with a unitary 
budget of 1 and 10. We compare the proportion of accepted requests belonging 
to each type of class traffic for different request arrival rate. For instance, a rate 
equal to 5 means that 5 new requests arrive at each 10 time slots. With curve 
in figure 1, it is clear that requests with highest bid have more chances to be 
accepted than those belonging to the second class. This differentiation is more 
visible when the network is overloaded and thus auctions are more involved. We 
can conclude that the precedence problem does not really influence the efficiency 
of our mechanism. 




334 



D. Barth and L. Echabbi 




o 



O 

o 

Q. 

E 

Q. 



0.75 

0.7 

0.65 

0.6 

0.55 

0.5 

0.45 

0.4 

035 

0.3 

025 




5 10 15 20 25 30 35 40 45 50 55 

arrival rate 



Fig. 1 . Accepted request distribution for two type of traffic (two different bids) 



4.2 Scenario2 

In this scenario, we consider a mono network manager context where some prior- 
ity level is given to different class of traffic by giving them an appropriate global 
budget. For instance, the differentiation can be based on request demands (traffic 
volume) . 

We consider a network where the topology is a grid with 10*10 nodes. For 
simplicity, links capacities are fixed to 100 unit of resource. A set of new requests 
is generated each 10 time slots. Requests origins and destinations are generated 
with a uniform distribution on the grid nodes. Connection durations are fixed to 
50 time slots for all requests. Demands in terms of unit of resource are generated 
uniformly in [2.. 7]. Global bids are determined by functions in section3.1. 

The curve of figure 2 illustrates the impact of requests budget function on 
the distribution of requests acceptance. With the functionl, the network accepts 
more big requests than with the 3 and function2. Indeed, requests with big 
demands should have highest bids to be sure to be accepted. Small requests 
have more chances to be accepted because they fit easily when the network is 
high-utilized. 



4.3 Scenario3 

In this scenario, we simulate a hot spot point in the grid. All requests are destined 
to this point. We observe prices while approaching the destination. 
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Fig. 2. Accepted request distribution for different bid functions 




Fig. 3. Resource prices at different distance from the Hot-Spot for different arrival 
rates 
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Figure 3 illustrates the evolution of prices on links at different distances from 
the grid center for different request arrival rates. Prices are more important at 
one hop from the destination. This is due to the fact that those links are highly 
utilized since all requests have to use them. Prices are then good indicators of 
the network utilization. 



5 Conclusions and Future Work 

The proposed DiMA mechanism is a realistic resource allocation scheme where 
routing and resource reservation are distributed upon involved network nodes. 
The calculated prices are good indicators of the network utilization and can be 
used to propose an efficient pricing scheme. 

An open problem is how to propagate information about resource prices on 
different links to take it into account to optimize the decision making. Some 
distributed mechanism (an inundation algorithm) can be used by routers to 
communicate local prices to their neighbors. These informations can be useful 
to routers while choosing an output link among many possible ones. Indeed, 
requests can be routed towards less congested directions. Moreover, requests can 
deduce in each part of the network they need to bid more. 

For this purpose, some procedure is required in routers to summarize the 
different informations that are received and then use it for the decision making. 
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Abstract. We use the tools from game theory to understand the im- 
pacts of the inherent congestion pricing schemes in TCP Vegas as well 
as the problems of parameter setting of TCP Vegas on its performance. 
It is shown how these inherent pricing schemes result in a rate control 
equilibrium state that is a Nash equilibrium which is also a global opti- 
mum of the all- Vegas networks. On the other hand, if the TCP Vegas’ 
users are assumed to be selfish in terms of setting their desired number of 
backlogged packets in the buffers along their paths, then the network as 
a whole, in certain circumstances, would operate very inefficiently. This 
poses a serious threat to the possible deployment of Vegas-based TCP 
(such as FAST TCP) in the future Internet. 



1 Introduction 

The Internet has been a huge success since its creation in the early 70’s. It has a 
big impact on the way we interact and communicate. As the Internet evolves, it 
is shared, used by millions of end-points and many kinds of applications. They 
compete with each other for the shared resources and their demand for resources 
(such as bandwidth) is growing rapidly. As a result, congestion at certain points 
of the network is inevitable. The TCP protocol suite was originally designed to 
control congestion in the Internet and to protect it from congestion collapse. 
Basically, TCP is a closed loop control scheme. Congestion in the network is fed 
back to the source in the form of losses (Reno-like versions) or delay (such as 
TCP Vegas) The source then reacts to the congestion signal from the network 
by reducing its transmitting rate. In other words, we can consider packet loss 
and high queueing delay as the cost of (aggressively) sending packets into the 
network. The higher the rate, the higher the cost (certainly, the relationship 
is not necessarily linear in nature), given a fix network. Furthermore, as the 
Internet has been gradually transforming from a government sponsored project 
to a private enterprise (or even a commodity), the economics of the Internet 
becomes more and more important issue. Consequently, Internet connectivity 
and services will have to confront issues of pricing and cost recovery. In this 
perspective, the cost of congestion can be in monetary form. Introducing cost 
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of congestion into the network creates balance, stability and high utilization of 
resource usage. 

From what has been discussed so far, the cost of congestion, in our case, can 
be either price or delay (the application is delay-sensitive). It is suggested in [13] 
that congestion pricing could be implemented by using ’’smart market” where 
price for sending a packet varies on a very short time scale. Specifically, for the 
TCP implementation, the congestion price is updated every round-trip time. 

A natural question arises then: Why TCP Vegas? There is a number of rea- 
sons that motivate us to re-examine TCP Vegas. Firstly, it is because TCP 
Vegas has inherent pricing schemes in its design that resemble the congestion 
pricing schemes proposed in the literature. We believe that by better under- 
standing TCP Vegas’ inherent pricing schemes, we will have a better insight 
into understanding and designing pricing schemes for TCP traffic in general. 
Secondly, the emergence of very large bandwidth-delay product networks such 
as the transatlantic link with a capacity in the range of 1 Gbps - 10 Gbps, new 
transport protocols have been proposed to better utilize the network in these 
circumstances. One promising proposal is the FAST TCP [18]. Since the design 
of FAST TCP is heavily based on the design of TCP Vegas, there is a need to 
reconsider the benefits as well as the drawbacks of TCP Vegas in order to have 
an insight into the performance and possible deployment of FAST TCP in the 
future Internet. 

Given the pricing schemes (parameter setting), another natural question 
arises then: Are these schemes efficient? Is there any equilibrium state from 
where no one has the incentive to deviate? Game theory (see [4] for a compre- 
hensive introduction) provides us the tools to answer these questions as we will 
illustrate later in the paper. We use these game-theoretic tools to investigate the 
impact of the pricing schemes and parameter setting in TCP on the performance 
of each user as well as the network as a whole. 

Regarding the related work, we would like to divide it into two classes. The 
first class mainly deals with game-theoretic analysis of flow control mechanisms 
in the Internet. Scott Schenker in his pioneering paper [2] used game-theoretic 
approach to analyze the flow control mechanisms (for Poisson arrivals) with 
different queueing disciplines at the routers. Korilis et al in [3] also used game- 
theoretic approach to study the existence of equilibria in noncooperative optimal 
flow control, especially those with QoS constraints. Recently, Akella et al in [1] 
also used the tools from game-theory to examine the behavior of TCP Reno-like 
(loss-based) flow controls under selfish parameter setting. Our work is different 
from their work in the sense that we study delay-based versions of TCP. We 
provide an extensive analysis of the parameter setting problem of the traditional 
TCP Vegas, the modified version of TCP Vegas (TCP Vegas under REM) as 
well as FAST TCP. The second class deals with the mechanisms and issues of 
congestion pricing in the Internet. We would mention here the work of MacKie- 
Mason et al [13], [14]. These papers apply economic theory (’’club theory”) to 
study basic issues of congestion pricing in the Internet. Incentive-compatible 
pricing strategies in noncooperative networks are introduced and analyzed in 
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[8]. A survey on Internet pricing and charging in general can be found in [6]. 
These papers deal with the general pricing problems. Our paper, on the other 
hand, deals specifically, from game-theoretic point of view, with TCP Vegas and 
its variants. 

The main contributions of the paper are the followings. First, we prove that, 
under the inherent congestion pricing schemes in TCP Vegas, there exists a 
unique Nash equilibrium of the rates of the TCP Vegas flows sharing a common 
network and this equilibrium rate vector is also system-wide optimal. Secondly, 
we provide an extensive game-theoretic analysis of the parameter setting problem 
in TCP Vegas. We conclude that, in this case, the Nash equilibria (if any), can 
be very inefficient. This implies that all- Vegas networks are vulnerable to selfish 
action of end-users posing a serious threat to the possible deployment of all- Vegas 
based (such as FAST TCP) in the future Internet. 

The rest of the paper is organized as follows. The background on TCP is 
provided in Section 2. The TCP Vegas games are described and analyzed in 
detail in Section 3. Finally, Section 4 concludes the paper. 

2 Background 

2.1 TCP Vegas 

TCP Vegas was first introduced by Brakmo et al in [16]. Basically, it is a delay- 
based congestion control scheme that uses both queueing delay and packet loss 
as congestion signal. TCP Vegas tries to control the number of packets buffered 
along the path with the targeted number to be between a and (3 (a < (3). Let 
w(t) denote the congestion window at time t, RTT denote the round-trip time 
and baseRTT is the smallest value of the round-trip time so far (actually, this is 
an estimate of the propagation delay). Denote diff = RTT ~^fp RTT w, then the 
dynamics of the congestion window of TCP Vegas can be expressed as follows: 

{ w(t) + 1 if diff < a, 
w(t) — 1 if diff > (3 , (1) 

w(t) otherwise. 

In a TCP Vegas/REM network [15], a slight modification is introduced into 
the updating mechanism of the congestion window. Each link l (with capacity 
ci) update the link price pi(t) in period t based on the aggregate input rate x l (t) 
and the buffer occupancy bi(t) as follows: 

pi(t + 1) = \pi(t) + 7 (pibi(t) + x\t) - ci)] + (2) 

where 0 < 7 and 0 < pi < 1 are scaling factors of REM. Each source will 
estimate the total price along its path and update its sending rate accordingly. 
To feed back the prices to sources, link l marks each arriving packet in period 
t, that is not already marked at an upcoming stream, with probability m/(t) 
defined as: 

mi(t) = 1 — ip~ P1 ^ 
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where ip > 0 is a constant. Once a packet is marked, its mark is carried to 
the destination and then conveyed back to the source via acknowledgement, like 
ECN scheme. The source i estimates the end-to-end marking probability by the 
fraction rfij(t) of its packets marked in period t, and estimates the path price 
Pi(t) by: 

Pi(t) = -log^l - m*(f)) 

The dynamics of the congestion window of TCP Vegas/REM can be expressed 
as follows: 

(wi(t) + 1 if — RT 'x^ t ) < 

Wi(t + 1) = l Wi(t) -1 if ( 3 ) 

[u>i(f) otherwise. 

2.2 Throughput Models of TCP Vegas 

Throughout the paper our game-theoretic analysis uses the models that are 
previously derived. These are: 

Model 1: (Thomas Bonald’s model) 

In [17], the throughput of multiple flows sharing a bottleneck link is analyzed (by 
using fluid approximation) both for TCP Reno and TCP Vegas case. Assume N 
TCP flows sharing a bottleneck link with capacity /z, propagation delay r and 
buffer size B. The parameters of TCP are: a and (3. The main results in their 
paper that we use in our analysis are the following: 

— If Na < B, there exists a finite time from which no loss occurs. In addition, 
the window size stabilizes in finite time. If a ^ (3, the congestion windows 
converge not to a single point (but a region) . This implies unfairness among 
flows even in equilibrium. If a = (3, then W\ = u >2 = ••• = Wjv = ^ + a and 
the average rate \i = \2 = ■■■ = Xn = -ft- Note that in this case the link is 
fully utilized. 

— If Na > B , then TCP Vegas behaves exactly like TCP Reno. Let u> = ^ 
and 7 is the multiplicative decrease of TCP Reno (typically |). If u > 

then Xtotai — 2 (i- 7) (u£+ut ) + 1 ^ < f 4 ’ This implies that in this case the link is 
not fully utilized. 

Model 2: (Steven Low’s model) 

Steven Low et al in [10], [11], [12], [15] described an optimization framework 
to study the performance of the TCP Vegas in a general network topology and 
under different queue management schemes at the routers. We would mention the 
result regarding the throughput of TCP Vegas under REM queue management 
scheme (TCP Vegas/REM) that we will use in our analysis later in this paper. 
It is proved in [11] that the equilibrium rate of TCP Vegas can be calculated as: 
Aj = p-, where p* denotes the equilibrium price. Note that this result is true for 
a general network topology (not restricted to a single bottleneck link). 
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3 The TCP Vegas Games 

In this Section, the games regarding the inherent pricing schemes (for rate al- 
location) and parameter setting of TCP Vegas are described and analyzed in 
detail. We also investigate the impact of the results on the performance of TCP 
Vegas and on the network as a whole. 



3.1 Game 1: Rate Allocation of TCP Vegas 

We consider a network that consists of a set C = {1,2, ...,L} of links with 
capacity q, l € Zb Assume that the network is shared by a set of flows (sources). 
The set of flows is denoted by Af = {1, 2, ..., N}. The rate of flow i is denoted by 
Xi, i € Af. Flow i uses a subset (Zb) of C in its path (Zb C £). Let us define the 
routing matrix as follows: 



R/, 



1 if l e Zb, 

0 otherwise. 



The physical capacity constraints of the flows therefore can be defined as : 

Rx < c (4) 

where x = (aq, aq, ..., a:jv) is the flow rate vector and c = (ci, C 2 , ..., Cl) is the 
link capacity vector. In addition, flow rates cannot be negative: 

Xi > 0, * = 1,2,..., N (5) 

The set of flow rate vectors A that satisfy both conditions 4 and 5 is called a 
feasible set. 

It should be mentioned that our network, as TCP network in general, assumes 
feedback-based flow control. The feedback can be implicit (e.g. queueing delay) 
or explicit (by pricing and/or using Explicit Congestion Notification (ECN)). 
The sources (end-points) use Vegas-style flow control, as defined in [16], [11]. We 
consider the flows as the players of the game. The strategy space for a player is 
the range of it sending rate. 

Let us define the following generic payoff function for each player 

Bj{xj) = ctj log(a;j) - ^ / i n(y)dy (6) 

ieCi 

and 7T; = Pi(J2i£Ck a ' fc ) defined as the function of the total flow rates on link 
l. This function is actually the price that is fed back to the player i sending 
at rate a;,, which is an increasing function. The higher the rate, the higher the 
price. Hence, the second term in Equation 6 can be interpreted as the band- 
width cost fed back to player i when it attempts to transmit at rate x,. The first 
term in Equation 6 reflects the gain of player i when transmitting at rate Xi, 
[12] (note that this is a concave function of £,). As a result, the payoff Bi(xi) 
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represents the net benefit of player i when transmitting at rate Xj. The price 
(cost) can be communicated to the end-user (the player) by the mean of the 
total queueing delay of its packets in the path, as in TCP Vegas/Drop-Tail net- 
work. The price can also be communicated explicitly to the user by using REM 
active queue management scheme (with ECN) and here we have a Vegas/REM 
network. In the first case, we would like to mention that we implicitly use the 
PASTA property for Poisson arrivals with FIFO scheduling principle to derive 
the proportional relationship between the total rate arrive at the link and the 
queuing delay. We can also suggest here the Little’s formula for this relation- 
ship. This is assumed frequently in the literature with or without any mention. 
In any case, if the aggregate arrival flow is not Poisson (e.g. self-similar traffic), 
then queue length (queueing delay) is generally larger than the Poisson one. 
Furthermore, the expression of queueing delay in our model is assumed to be 
additive among links. This is true for a Norton network with Poisson arrivals. 
So, strictly speaking, our analysis can be considered as a worst case analysis 
for TCP Vegas/Drop-Tail network. For TCP Vegas/REM network, the additive 
assumption is justified when the mark rates are small. Indeed, let 7 q(t) be the 
marking probability at link l at time t and the end-to-end marking probability 
(ji(t) that the end-point i observes (and to which source algorithm reacts). For 
small 7 n(t), Qi(t) = 1 - TLe^t 1 - ~ E i e c t 

Under the assumptions mentioned above, our problem can be modelled as a 
non-cooperative game. The strategy space for a player is its sending rate and is 
determined by the capacity of the links. The strategy for player i can be defined 
as = {cci|0 < Xi < c l max }, where c l max = max{c;|^ £ £,}. The strategy 
space for the game is defined as the Cartesian product S = S^\ which 

is equivalent to the feasible set A. Strategy x = (x\,X 2 , ... ,xn ) £ A is called a 
strategy profile. Each player (e.g. player i) chooses the sending rate ( Xi ) in the 
feasible set in order to maximize its own payoff function Bi(xf) in a selfish way. 
By ”in a selfish way” we mean that the player does not care about other players’ 
payoff, as far as the rate vector is in the feasible set. 

One of the key questions in a non-cooperative flow control game in general, 
and our game in particular, is whether the network converges to (or settles at) 
an equilibrium point, such that no player can increase its payoff by adjusting 
its strategy unilaterally. In the game-theory terminology such a point is called 
a Nash equilibrium. The Nash equilibrium in our game also reflects the balance 
of the gain and the cost for each player as well as for the network as a whole. A 
non-cooperative game may have no Nash equilibrium (in its pure strategy space), 
multiple equilibria, or a unique equilibrium. As for the TCP Vegas game, we can 
prove the following theorem: 

Theorem 1. There exists a unique Nash equilibrium (in its pure strategy space) 
for the TCP Vegas game described above. 

We follow the proof methodologies provided in [5] as well as in a recent paper [9] . 

Proof. First, let’s consider the existence of the Nash equilibrium for the TCP 
Vegas game. Notice that the feasible set A = {x|Rx < c,x > 0} is a nonempty, 
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convex and compact set. It is nonempty because x = (e,e, ...,e) £ A, where 
0 < e < c m i n = m\n{ci\l £ £}. It is bounded because < c ma x,i £ A/", 
where c max = max{c;|/ £ £}. Assume that xi,X 2 £ A and 0 < p < 1, we have: 

pxi + (1 - p)x 2 < R(/oxi + (1 - p)x 2 ) < c 

This result implies the convexity of A. 

Now let’s consider the payoff functions of the players. Notice that Bi(xi) is 
a concave function of x.;. Indeed: 

B ’i (Xi) = n 'l < 0 ( 7 ) 

1 leCi 

From what have been discussed so far, our game has the following properties: 

1. The joint strategy space is nonempty, convex and compact. 

2. The payoff function of each player is concave in its own strategy space. 

According to Theorem 1 in [5], there exists a Nash equilibrium in its pure strategy 
space. 

For the uniqueness of the Nash equilibrium, let’s consider the (nonnegative) 
weighted sum of the payoff functions: 

N 

cr(x, w) = £ WiBi(x), Wi > 0 (8) 

i = 1 

Denote g(x, w) the pseudo-gradient of cr(x, w), then the Jacobian of g(x,w) 
with respect to x can be computed as follows: 

Bn Bn . . . Bin \ 

B 2 \ B 22 ■ ■ ■ B 2N 
Bni Bn2 ■ ■ ■ Bnn ) 

where 

-EieA 7 ^) <° j = i 

Bij = j ~ w i i ^ b ^ ® 

1° 3 7^ = 0- 

where £(ij) = £; fj £j. The matrix G defined above is thus negative definite. As 
a result, according to Theorem 6 in [5], cr(x,w) is diagonally strictly concave. 
According to Theorem 2 in [5], the equilibrium point of the TCP Vegas game is 
unique. 

Remark 1. To reach this equilibrium, [5] shows that each player can change 
its own strategy at a rate proportional to the gradient of its payoff function 
with respect to its strategy and subject to constraints. This method is in fact 
equivalent to the gradient projection algorithms described in [10]. 
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Remark 2. The authors of [10], using optimization framework, also showed that, 
under certain assumptions on the step size, these algorithms converge to a sys- 
tem wide optimal point (which is also proved to be unique). Furthermore, it is 
proved in [11], [12] that the rate control of TCP Vegas/Drop Tail and TCP Ve- 
gas/REM is indeed based on these algorithms. This implies that the TCP Vegas 
game described, above converges to a unique Nash equilibrium that is system wide 
optimal. 



3.2 Game 2: Parameter Setting of TCP Vegas 

In this game, we consider the parameter setting of TCP Vegas. As described 
in [16], TCP Vegas tries to maintain the number of backlogged packets in the 
network between a and (3. We examine here the situation when a selfish (and 
greedy) user tries to increase the number of its backlogged packets in the network 
in order to grab more bandwidth in the network. If all other players do the same 
thing (i.e. they are also selfish and greedy), the total number of packets in 
the network would increase without bound. However, the size of the buffers at 
routers are bounded and packet loss would occur, reducing the throughput of 
the connection. We are interested in a situation (i.e. a parameter setting, if at 
all exists) from where no player would deviate. 

We consider a simple topology of N TCP Vegas sources sharing a single 
bottleneck link with a buffer size of B packets. Source i is associated with a 
set ( ). In this paper, we deal with the case when at = (3 The case when 
a* ^ f3i is left for future work. 

Players: N TCP Vegas flows 

Actions: Each player can set its parameter (cq) in order to control the 

number of its backlogged packets in the queue of the bottleneck link (with 
capacity p and delay r) . The router is assumed to use Drop-Tail mechanism 
(FIFO principle) 

Payoff: f(&i) = A (the average throughput) 

If the total number of backlogged packets is smaller than the buffer size at the 
bottleneck router (i.e. XOyLi. a j < -B) then the payoff function of player i can be 
expressed as follows: 



/(a,) = A i = 



Oii 






EjLi '>./ a 3 



From Equation 9 we have: 



(9) 



df = h Ej#« a i 
dan (n ; • Vb^o,! 2 



(10) 



Since Yhj=jH a j I s always positive, it follows from Equation 9 that > 0, Vi 
This implies that given other players’ strategies, player i will set on as high as 
possible in order to maximize its payoff. Notice that Equation 9 is valid only 
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if a i < B- Otherwise, TCP Vegas, according to [17], behaves exactly like 
TCP Reno. In this case, there are two possibilities [17]: 



v Reno 



(1-7 2 )Q+1) 2 fi ji_ 

2(l-7)(a; 2 +a;) + l N ^ N 
N 



otherwise. 



( 11 ) 



Thus, we have two cases: 

Case 1: w < A 5 — 

1-7 

It is important to note that in this case, the link is fully utilized both for TCP 
Vegas and TCP Reno. Furthermore, in TCP Reno style performance, the band- 
width is fairly (equally) shared between flows (because they have the same RTT). 
Denote a* = (a* , a^, a^) be the Nash equilibrium of the game in this case. 
Without losing generality, we can assume that a* < < . . . < a%. Notice that 

in Nash equilibrium, we must have a* = = ... = a* N . Otherwise, player 1 has 

the incentive to deviate (i.e. to increase its number of backlogged packets - aq) 
in order to get higher throughput, because in Reno style performance, it would 
get a fairer share of the total bandwidth (i.e. jjj). As a result, we have the Nash 
equilibria for this game: a* = (a*, a^) where a* > L)tJ> Vi. This means 
that, in this case, in Nash equilibrium, the parameter a can be arbitrarily large. 
Case 2: w > 

In this case, the link is not fully utilized. Following similar reasoning as in 
Case 1, we have a set of Nash equilibria defined as follows: i? = {a = 
(au, ..a:jv-)|a:i < «2 < < a^v} with the conditions that Xu=i cq = B — 1 and 

cx\ > 2 (i-i) (v*+u ) + 1 l a tt er expression simply means that even player 1 

(who gets the smallest bandwidth) would not deviate, so no other player would 
deviate. If this condition does not hold, player 1 would deviate to get higher 
bandwidth share. 

Our final comment on this unique Nash equilibrium is that each TCP Vegas 
flow (player) maintains the number of its own backlogged packets as many as 
possible. As a result, the buffer is nearly full and the queueing delay is unnec- 
essarily high. A nearly full buffer may cause many difficulties for TCP Vegas 
(e.g. the estimation of baseRTT might be inaccurate if there are already many 
packets in the queue when the connection starts) 

4 Conclusion 

We have demonstrated, by using game-theoretic approach, how TCP Vegas’ 
inherent pricing schemes as well as the parameter setting impact on its perfor- 
mance. Our analysis shows that these inherent pricing schemes result in a rate 
control equilibrium state that is a Nash equilibrium in game-theoretic terms 
which is also a global optimum of the all- Vegas networks. We also proved that 
the parameter setting of TCP Vegas is very vulnerable to selfish actions of the 
users. This poses a serious threat to the possible deployment of FAST TCP in 
the future Internet. 
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Abstract. This paper describes BarterRoam: a new novel mobile and wireless 
roaming settlement model and clearance methodology, based on the concept of 
sharing and bartering excess capacities for usage in Visiting Wireless Internet Ser- 
vice Provider (WISP) coverage areas with Home WISP. The methodology is not 
limited to WISPs; it is applicable to virtual WISPs and any Value-Added Services 
in the mobile and wireless access environments. In the Broadband Public Wireless 
Local Area Network (PWLAN) environments, every WISP provides its own cov- 
erage at various locations or Hotspots. The most desirable option to help WISPs to 
reduce cost in providing wider coverage area is for the Home and Visiting WISPs 
to collaborate for customer seamless access via bilateral or multilateral agreement 
and proxy RADIUS authentication [ 1 ] . This is termed a roaming agreement. Due 
to the large number of WISPs desiring to enter the market, the bilateral or multi- 
lateral roaming agreements become complex and unmanageable. The traditional 
settlement model is usually based on customer’s usage plus margin. In the broad- 
band PWLAN environment, most WISPs and customers prefer flat-rated services 
so that they can budget expenses accordingly. The current settlement model will 
not be able to handle the preferred flat-rated settlement. Hence, a novel flat-rated 
settlement model and clearance methodology for wireless network environments 
is proposed to enable multiple service providers to trade their excess capacities 
and to minimize cash outflow among the service providers via barter trade mode. 
We are unaware of other comparative work in this area. 



1 Introduction 

With wireless and mobile networks evolving into Internet networks, allowing users 
wireless access to email, web browsing and all available Internet services, comes the 
challenge of charging and billing for the wireless roaming, services, and applications 
provided. Provision of these services on wireless and mobile networks presents unique 
challenges to the service providers, especially in the areas of metering and accounting, 
Quality of Service (QoS), Service Level Agreements (SLAs), authentication, wireless 
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roaming, etc. Wireless Internet Service Provider (WISP) roaming is different from 20- 
style roaming because the Internet model is different from the 2G roaming framework. 
2G evolved from the Public Switched Telecommunications Networks (PSTN) model, 
which contains relatively dumb telecommunication user devices, centralized control, 
strict network hierarchies, and a circuit- switched basic service. The Internet model con- 
tains powerful user terminals, end-to-end service deployment, a core network, and best 
effort TCP/IP packet-forwarding service. In addition, the Internet is exemplified by the 
flexibility provided by IP subnet and Network Address Translation (NAT) islands. Thus, 
inflexible 2G-style roaming agreements where each party must provide connectivity to 
all requesting customers of another party, which could as a result, create imbalance due 
to roaming traffic not being symmetrical. 

In the broadband Public Wireless Local Area Network (PWLAN) environment, every 
Wireless Internet Service Provider (WISP) provides its own coverage at various locations 
(Hotspots). Roaming between Hotspots belonging to different WISPs offers mobility and 
more access points. By signing a bilateral roaming agreement WISPs allow their end- 
users to use each other’s network. An end-user can today travel abroad and only use 
another WISP network in two cases: A local pre-paid (scratch)-card that allows a time 
or Kbytes limited access to the visited network [4]. The service is bought from a local 
WISP to access its covered public Hotspots. The end-user becomes a client of the foreign 
WISP and is, by definition, not a roamer. This is also referred to as Plastic Roaming. 
Secondly, a subscription is bought with a broker or aggregator [1] who will allow the 
usage of all networks in the alliance. Strictly speaking this is not a roaming scenario as 
the end-user pays its charges to the aggregator or broker and does not in the strictest 
sense have a home network. 

Most of the roaming issues can easily be solved between two WISPs. The authenti- 
cation and billing processes can simply be implemented as long as both networks agree 
on the process. However, in a market with a multitude of networks with a variety of 
backgrounds it is necessary to implement workable standards so that the whole market 
can benefit from straightforward roaming procedures. Some vendors and players develop 
Subscriber Identity Module (SIM) [7], [6] card-based authentication modules with the 
use of the Mobile system Home Location Register (HLR) and Visitor Location Register 
(VLR) for roaming settlement; it will be difficult to introduce this into the whole market. 
Although the SIM card authentication is a very valuable and proven method; it will be 
necessary to agree on certain authorization procedures. This will include workable oper- 
ating procedures that deal with interoperability between technologies and authentication. 

In the situation where the Home WISP’s customers enter into another Visiting WISP’s 
coverage area, where the Home WISP do not have coverage, the following scenarios 
could happen: 

- The customer does not have access, although his hardware is capable of gaining 
access, but he is not authorized. 

- The customer can gain access by signing up with the Visiting WISP either via hourly 
rate or as a new subscriber to Visiting WISP. 

- The Home and Visiting WISPs collaborate to allow the customer seamless access via 
bilateral agreement and proxy RADIUS [2] authentication. This is termed a roaming 
arrangement. 
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The third option is most desirable because it helps WISPs to reduce cost in providing 
wider coverage area by sharing networks [8], 

Due to the large number of WISPs desired to enter the market, the bilateral or 
multilateral roaming arrangement become complicated and unmanageable when the 
numbers of WISP in the ecosystem increase. A settlement house similar to the concept 
of a credential center [9], managing the roaming arrangement is needed. The tradition 
settlement model as mentioned earlier is usually based on customer usages plus margin. 
In the broadband PWLAN environment, most WISPs and customers prefer flat rated 
services because they can budget expenses accordingly. 

In this paper, we propose BarterRoam: a new novel mobile and wireless roaming set- 
tlement clearance model, based on the concept of sharing and bartering excess capacities 
for usage in Visiting WISP coverage areas. A bilateral or multilateral roaming agree- 
ment would have been established between the WISPs, so that the proposed roaming 
settlement system would be implemented. We assume in addition to roaming agree- 
ment, some incentive mechanisms [5] to be rewarded or routing protocol that achieves 
the truthfulness and cost-efficiency in a game theoretic sense [3] exist, to ensure that 
roaming domains offer the right amount of resources that is economically justifiable for 
contribution. The methodology is not limited to WISP; it is applicable to virtual WISP 
and value-added services. The objectives of this settlement methodology are: 

- To enable multiple service providers to trade their excess capacities for usage in 
other service providers’ service coverage areas; 

- To minimize cash outflow from any service provider by enabling the trade of ex-cess 
capacities via barter trade mode. 



2 Background 

2.1 Assumptions 

We are assuming that every WISP builds its service with excess capacity or headroom to 
maintain the desirable Quality of Service (QoS). The excess capacity is not used and will 
be wasted. Therefore, after provisioning sufficient resources with desirable QoS for their 
own internal customers, it is desirable to provide access to other WISPs’ customers to use 
the excess service and network, thus increasing revenue and utilization. We also assume 
that bilateral or multilateral roaming agreement among WISPs have to be established for 
offering the service to other WISPs’ customers, and WISPs’ customers will have more 
pervasive coverage of roaming access. This will provide more value-added benefits to 
customers. 

2.2 Concepts and Working Models 

The settlement methodology works as follows: 

- Every WISP will allow Proxy RADIUS authentication via the settlement clearing 
server and therefore allow the Home WISP users to roam into Visiting WISP cov- 
erage area. 
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- Every WISP will compute their excess capacities based on number of Hotspots 
owned. Hotspot capacities, number of customers, average customer usages and op- 
eration duration. This excess capacity is then contributed to the settlement ecosys- 
tem. 

- Based on the capacity computed, the Visiting WISPs in the entire ecosystem allow 
the Home WISP’s users to use up to the same amount of excess capacity Visiting 
WISPs contributed. 

- The settlement is done monthly or upon agreed periodic billing cycle. 

If the Home WISP’s users use less than the Home WISP’s contribution, then, no 
additional settlement is needed. In this situation, the excess capacity is wasted, but at 
a lower wastage limit. If the total utilization of the Home WISP’s users exceeded the 
Home WISP’s contributed amount, the below settlement computation follows: 

- Home WISP’s user usage percentage distribution amongst the Visiting WISP area 
is computed. 

- Retail flat rate for each Visiting WISP is used to compute the total due the Visiting 
WISP. 

- Wholesales discount is applied to compute the actual amount due. 



3 Proposed Novel Settlement Model and Methodology Flow 

3.1 Stage I - Due Diligence 

Before a WISP can join the network, the following information is needed to ensure that 
the new WISP is compatible in pricing and QoS level: 

- Number of Hotspot {N Hotspot) - The number of wireless service coverage areas. 

- Hotspot capacity ( NnotspotUsers ) - This is the average number of concurrent users 
that can gain access to the service. The capacity is usually determined by brand and 
type of access points (AP) used, i.e. AP supports the number of users. 

- Operating hour {H) - This is the time where user can gain access to the network. 
Operating hours is usually in sync with the Hotspot opening hour, but not necessary 
limited to. The maximum operating hour is 24 hours. 

- Operating day ( D ) - This is days in a month where the Hotspot opens for business. 
The norm used in this settlement methodology is up 30 days per month. 

- Number of users ( Nu sers ) - This is the total signed up users that the WISP had 
acquired at the time of application and in Stage I. 

- Monthly average utilization per user (p) - This is computed from RADIUS log at 
the time of WISP application and in Stage I. For example, if the utilization pattern 
for a user is 2 hours per day, and there are 20 days per month of utilization, then 
there are 2 x 20 = 40 hours of monthly average utilization per user within Home 
WISP network. 

- Retail flat rate (a) - This is the monthly flat rate the Home WISP bill its customers. 

- Wholesales discount (/ 3 ) - This is the discount from the flat rated retail hourly price 
the Visiting WISP sell the service to Home WISP. 
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- Average Hotspot rental price ( 7 ) - This is the property rental price that the venue 
operator paid for that location, i.e. average property space rental price per sq. ft. of 
the property market. 

The parameters will be revised at each billing cycle, or at pre-agreed periodic cycle, 
taking into consideration such as Hotspots that are not operational, new Hotspots, addi- 
tional users, etc. With all the above parameters, the following secondary parameters are 
computed: 

- Total WISP Capacity - ( l \v ! S I 1 total — N Hotspot X NriotspotUsers x P x D 

Unit: Hours 

- Own Utilization - CPwiSPutilization — ^Users X h 

Unit: Hours 

- Excess capacity - CPwiSPexcess — CPwiSPtotal CPwiSPutilization 

Unit: Hours 

- WISP hourly rate - PwiSPhouriy = a A p 

Unit: Dollars 

- WISP contribution - fwiSPcontrib = CPwiSPexcess X PwiSPhouriy 

Unit: Dollars 

- Broadband Value Index (BVI) - This is introduced and used to ensure that the WISP 
pricing and contribution is equitable amongst the WISP within the same city where 
the living standard and QoS should not fluctuate too much. Unit: None 

jgyj — I 1 r (l-0)xaxN„ ot spot \ 

'y Nusers f 

Note that for virtual WISP, the number of Hotspot is 0 (Zero) and contribution amount 
is $0 (Zero Dollars). There will be a pledge for [ average user utilization], and this data 
is available from the WISP whom the virtual WISP collaborate with. 

Once the new WISP is confirmed to have excess capacity and its BVI is within 
allowable range (e.g. within standard deviation and/or mean range of values of the 
current service providers), it is included into the ecosystem and the contract with the 
WISP is entered to include the following: 

- Confirmed Wholesales discount (/3) 

- Confirmed QoS commitment 

- Confirmed QoS commitment 

- Confirmed settlement period 

- Confirmed commitment to the ecosystem period 

- Confirmed per-user flat rate payable to the settlement organization to facilitate 
roam-ing authentication 

- Confirmed per transaction fee payable to the settlement organization to facilitate 
settlement computation 

Once the service contract is entered, the list of Hotspots of new WISP will be pub- 
lished. The system for inclusion of new WISP and QoS reporting are illustrated in 
Figure 1 and Figure 2. 

The following illustrations show the assumptions and Stage I parameters computa- 
tion, in Table 1 and Table 2. The Standard Deviation and Mean for BVI are 0.997 and 
1.462 respectively, are obtained from the list of WISPs’ BVI. Assumptions made for the 
initial parameters: 
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Fig. 1 . System Description for inclusion of new WISP 




Fig. 2. System Description for QoS Report 
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Table 1 . Assumptions for Parameters 



WISPs 


N Hotspot 


N H otspotUsers 


H (hours) 


D 


Njjsers 


fj, (hours) 


a 


13 


7 ($ per sq. ft.) 


A 


20 


20 


24 


30 


1000 


40 


$80 


30% 


$4 


B 


100 


20 


12 


30 


8000 


35 


$49 


20% 


$3 


C 


60 


16 


12 


30 


2000 


30 


$30 


30% 


$8 


D 


10 


16 


12 


20 


200 


40 


$80 


40% 


$2 


E (Virtual) 


0 


0 


0 


0 


2000 


41 


$67.20 


0% 


$0 



Table 2. Assumptions for Secondary Parameters Computations 



WISPs 


CPwiSPtotal 


C P\V I S Putilization 


C Pw I S Pexcess 


Pw I S Phourly 


4* W I SPcontrib 


%4>W I SPcontrib 


BVI 


A 


288000 


40000 


248000 


$2.00 


$496000 


34.01% 


1.37 


B 


720000 


280000 


440000 


$1.40 


$616000 


42.24% 


0.82 


c 


345600 


60000 


285600 


$1.00 


$285600 


19.58% 


0.76 


D 


38400 


8000 


30400 


$2.00 


$60800 


4.17% 


2.90 


E (Virtual) 


0 


82000 


0 


$1.64 


$0 


0% 


Nil 



1 . All WISPs have excess capacities that they are willing to barter trade. 

2. Each Hotspot’s access point (AP) can handle up to 20 concurrent users. 

3. If each Hotspot has average sitting capacity of 80 patrons, and 20% will be accessing 
the service concurrently. This worked out to be about 16 concurrent users. 

4. All Hotspots operate up to 24 hours a day, with average up to 30 days a month. 



3.2 Stage II - Settlement Computation Models 

In the situation whereby the Home WISP’s users’ utilization at the Visiting WISP ser- 
vice area exceeds the Home WISP contribution, the settlement is done by the following 
computation procedures. As illustrated in Table 3, we assume the utilization in hours 
at the Visiting WISPs (note that E is a virtual WISP). Firstly, the system tabulates the 
amount ( Amount-visiting ) of utilization at the Visiting WISPs: Amountvisiting = 
C P v isitingU tuization x PwiSPhouriy The system further checks for over-capacity by 
computing the exceeded amount (Amount Exceed), to ensure the QoS of the ecosys- 
tem: AmountExceed = XX Amountvisiting ) - rf’wiSPcontrib- In this example, users 
of WISP B and WISP D did not exceed their contribution, as illustrated in Table 5. 
The system further checks for Visiting WISPs’ capacity overload to ensure that total 
utilization by the Visiting users are within the overall available network excess ca- 
pacity, i.e. CPwiSPVisitingtotai < C P w i S Pexcess, as shown in Table 4. With the 
Amountvisiting computed, the utilization distribution for the Home WISP at the Vis- 
iting WISP is then tabulated using the below expression and illustrated in Table 6: 
% Amountvisiting = j^Ammntv'^P ) x 100- This ensure fair distribution of 
AmountExceed collected from the Home WISPs. The final stage of computation of the 
WISPs’ actual amount payable for its users’ utilization at the other Visiting WISPs’ net- 
works is done as follows: Amount p aya bie = % Amountvisiting x Amount Exceed x j3. 
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Table 3. Utilizations in hours at the Visiting WISPs, C PvisitingUtilization 





WISP A 


WISP B 


WISPC 


WISP D 


WISP E 


y > C Py isitingU tilization 


A User 


Nil 


320000 


80000 


0 


Nil 


40000 


B User 


20000 


Nil 


100000 


10000 


Nil 


130000 


CUser 


150000 


10000 


Nil 


2000 


Nil 


162000 


D User 


1000 


3000 


5000 


Nil 


Nil 


9000 


E User 


6000 


6000 


52800 


18000 


Nil 


82800 


C PwiSPVisitingtotal 


177000 


339000 


237800 


30000 


Nil 





Table 4. Perform check if within overall available network excess capacity, 

J 2 C PwiSPVisitingtotal < CPwiSP excess 





WISP A 


WISP B 


WISPC 


WISP D 


WISP E 


y v C PwiSPVisitingtotal 


177000 


339000 


237800 


30000 


Nil 


C Pw ISPexcess 


248000 


440000 


285600 


30400 


Nil 


Check! 


Okay 


Okay 


Okay 


Okay 


Nil 



Table 5. Utilization Amount at the Visiting WISPs, Amount V isiu ng = CPvisitingUtilization x 

Pw I S P hourly 





WISP A 


WISP B 


WISPC 


WISP D 


y 'j ATTlOUTit y i siting 


AmOUTlt Exceed 


A User 


Nil 


$448000 


$80000 


$0 


$528000 


$32000 


B User 


$40000 


Nil 


$100000 


$20000 


$160000 


$0 


CUser 


$300000 


14000 


Nil 


4000 


$318000 


$32400 


D User 


$2000 


$4200 


$5000 


Nil 


$11200 


$0 


E User 


$12000 


$8400 


$52800 


$36000 


$109200 


$109200 



Table 6. Utilization distribution for the Home WISP at the Visiting WISP, %Amountvisiting = 

Amount Visiting .. 1 DO 

^(Amountvi, siting) 





Payee 




WISP A 


WISP B 


WISPC 


WISP D 


WISP E 


Wholesales Discount (/?) 


30% 


20% 


30% 


40% 


Nil 


WISP A 


Nil 


84.85% 


15.15% 


0% 


Nil 


WISP B 


25.00% 


Nil 


62.50% 


12.50% 


Nil 


WISPC 


94.34% 


4.40% 


Nil 


1.26% 


Nil 


WISP D 


17.86% 


37.50% 


44.64% 


Nil 


Nil 


WISP E 


10.99% 


7.69% 


48.35% 


32.97% 


Nil 



Note: Further contra settlement can be achieved by matching the WISP’s payment, e.g. 
WISP C need only pay WISP A: $9170 - $1455 = $7155, instead of the full amount be- 
ing transferred. Table 7 illustrates the computation, and the WISPs’ total amount payable 
is also computed, i.e. (Amount payable)- 








356 



E.K. Lua et al. 



Table 7. Computation of the actual amount payable, Amount Payable = %Amountvisiting x 
AmOUntExceed. x 0 





Payee 






WISP A 


WISP B 


WISPC 


WISP D 


WISP E 


Total Payable 


Wholesales Discount (0) 


30% 


20% 


30% 


40% 


Nil 


Ys(AmOUnt Payable) 


WISP A 


Nil 


$5430 


$1455 


$0 


Nil 


$6885 


WISP B 


$0 


Nil 


$0 


$0 


Nil 


$0 


WISPC 


$9170 


$285 


Nil 


$163 


Nil 


$9618 


WISP D 


$0 


$0 


$0 


Nil 


Nil 


$0 


WISP E 


$3600 


$1680 


$15840 


$14400 


Nil 


$35520 



3.3 Stage III - Reconfirmation Check 



The settlement organization will further check to see if the WISP’s users are consuming 
too much usage at the Visiting WISPs. This is done by computing the following param- 
eters for each WISP and performing matching with the average hour/user/month. From 
Stage I and II, for each WISP, the average cost per user and total utilization hour are 
obtained. Thus, for each WISP, the utilization at the Visiting WISPs can be computed as 
the average hour/user/month. The parameters’ computations are illustrated below: 



- average cost per user 

- total utilization hour 



(Amount. Payable) 
N v 

sers 

C Py isitingUtili 



zation 



t : (C PvisitingU tilization') 



- average hour/user/month = ., 

jV U sers 

- margin/user = a — average cost per user 



If the utilization at the Visiting WISPs exceeded the Home utilization in terms of 
average hour/user/month, the WISP will have to pay the penalty of such leakage. This 
could happen due to the leakage or sharing of account. For virtual WISP E, the computed 
utilization is compared with the previously pledged [average user utilization ], and this 
computed data is available from the WISPs whom the virtual WISP E has collaborated 
with. As illustrated in below tables, for the case of virtual WISP E, the computed uti- 
lization of 41.4 average hour/user/month is almost the same as the pledged value at 41 



WISP E (Virtual) 




average cost per user = ^( Am ° unt Payabie) 

1'Users 

total utilization hour = J2 CPvisitingUtUization 

wpnrp hnnr/ii-pr/nmnth — 'P( GP VisiUngUtilization) 


$17.76 

82800 

41.4 

$49.44 


ClVCl dgC 11UU1/ LlSCi/lllUlllll — 

^ Users 

margin/user = a — average cost per user 



WISP A Extra Cost 




average cost per user = ’^0 — Payable ) 

total utilization hour = J2 C PvisiUngUtiiization 

o VPVn ^p hnnr/ii^er/mnnth — 2(C PyisitingUtilization) 


$6.88 

400000 

400 

$73.12 


dVCldgC 11ULII/ UftCI/ 111U11111 — 

^ Users 

margin/user = a — average cost per user 
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average hour/user/month. In the case of WISP A, the computed utilization at Visiting 
WISPs of 400 average hour/user/month is far exceeding the Home utilization of 40 av- 
erage hour/user/month. This could be due to leakage or sharing of account. So, WISP A 
will pay the penalty of such leakage. 



4 Conclusions 

We have described the concept of BarterRoam model, a wireless roaming system for the 
contribution of excess capacity to barter trade for usage in Visiting WISP coverage area 
as an effective means of providing QoS that will enable the Home WISP to expand its 
coverage footprint. This is done: 

- Without incurring huge Hotspot setup cost; 

- Without incurring huge roaming settlement cost when Home WISP users roam into 
Visiting WISP coverage area. 

This proposed settlement methodology sheds light on a formal computational ap- 
proach towards catering for flat-rate charging mechanism for all WISPs (peer-level) in 
mobile and wireless environments. 
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Abstract. Web pre-fetching is a technique that tries to improve the QoS per- 
ceived by a user when surfing the web. Previous studies show that the cost of 
an effective hit rate is quite high in terms of bandwidth. This may be the reason 
why pre-fetching has not been commonly deployed in web proxies. Neverthe- 
less, the situation can change in the context of 3G, where the radio access is a 
shared scarce resource and the operator may find useful to exchange fixed- 
network bandwidth by perceived QoS for subscribed customers. Most impor- 
tantly, in UMTS it is possible to charge for this service even though pre- 
fetching is provided by a third party. This paper studies this scenario, identify- 
ing the conditions where pre-fetching makes sense, describes the way 
OSA/Parlay could be used to enable charging, presents a tool developed for this 
purpose and analyses several issues related to charging for this service. 



1 Introduction 

Web pre-fetching is a well-known technology that has met a singular market niche in 
so-called internet boosters. The target of this sort of applications is to increase the 
effective QoS perceived by the end user by making use of spare access bandwidth to 
pre-fetch and cache those web pages most probable to be visited by the user in his/her 
next HTTP request. The average performance gain is driven by the ability of the pre- 
diction module to foresee the next link selected by the user. The context of this appli- 
cation is dial-up internet access over a low speed modem (e.g. V.34 or V.90). At first 
sight a reader may think that these internet boosters could be incorporated in 3G ter- 
minals and thus enhance the access bandwidth and virtually remove currently high 
RTTs (Round-Trip Times) from UMTS terminal to the Internet. However the charg- 
ing schemes applicable in 3G that rate the volume of carried traffic make this option 
not realistic. 



This work was partially funded by the 1ST project Opium (Open Platform for Integration of 
UMTS Middleware) IST-200 1-36063 and the Spanish MCYT under project AURAS 
TIC2001-1650-C02-01. This paper reflects the view of the authors and not necessarily the 
view of the referred projects. 
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An alternative to this is the deployment of a proxy cache in the network that features 
pre-fetching (Fig.l). This way, over-fetching does not happen on the limited- 
bandwidth radio segment and, consequently, this service can actually be delivered in 
a cost-effective way by the network operator and even exploited by a third party as 
discussed in this paper. Therefore pre-fetching can be used as a way to provide differ- 
entiated application-specific QoS to a number of users on a subscription basis. 








^UMTS « RT"I"uMTS 


^Internet W > Internet ^ 


D/i\ DTT /i\ /me\ 



R(i) (bit/s) , RTT (i) (ms) 



Fig. 1 . Network-based pre-fetching in UMTS 



The purpose of this paper is to show that this is possible and to identify key issues in 
its deployment, including the study of charging strategies for this specific QoS provi- 
sioning mechanism. On this sense, this paper is organized as follows. Section 2 pro- 
vides a quick overview of pre-fetching and its practical limits. Once described the 
nature of the technique to be exploited, section 3 discusses how this service could be 
exploited externally via OSA/Parlay. Section 4 describes a test scenario deployed in a 
real UMTS network, that uses a pre-fetcher and charges for its usage. Finally a num- 
ber of conclusions are drawn from this experience in section 5. 



2 Web Pre-fetching 

The research carried out in techniques to optimize web caching is very extensive. 
Therefore we shall try to cite only those works that bring key ideas required to under- 
stand the process, and how to charge for it in 3G. A broader survey can be found in 
[ 1 ]. 

As already defined, web pre-fetching is a technique that tries to improve the quality 
of service perceived in web browsing. The idea behind pre-fetching is enhancing an 
HTTP cache with initiative to retrieve in advance those web objects most likely to be 
downloaded by its users. This way, the hit ratio is increased and the effective latency 
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perceived by the user is reduced, under the premise that there is an excess of band- 
width available. 

An important early key work in pre-fetching techniques is [2] where the authors ana- 
lyze the latency reduction and network traffic for personal local caches, and introduce 
the idea of measuring conditional html link access probabilities and compressing this 
information with Prediction by Partial Matching. With this method and by using web- 
server-trace-driven simulation they obtain a reduction of 45% in latency at the cost of 
doubling the traffic. This can be considered a practical bound for prediction based on 
client access probabilities, which is the probability that the user accesses the link 
from a given page based on the user's personal navigation history. 

With regard to pre-fetching with shared network caches, as mentioned above the 
scenario applicable to 3G, [3] estimates the theoretical limits of perfect caching plus 
perfect pre-fetching in a reduction of client latency of 60%. More realistic results (40- 
50%) are documented in other works [4] that study the origin of the limit of im- 
provement: it comes from the increase of delay and burstiness caused by the extra 
traffic put on the network by over-fetching. Regarding mobility, [5] analyses the 
effect of moving from one access technology to another on the effectiveness of pre- 
fetching. 

As a conclusion from this brief survey techniques and bounds, it can be said that: 

• The improvement obtained by pre-fetching is not fixed and depends on the pre- 
dictability of the user behaviour. 

• The gain is measured in reduction of retrieval latency. The maximum perform- 
ance is around 50%. This means that the perceived improvement is actually de- 
termined by round trip times, size and speed of our context. In a context with low 
RTTs and high speeds the gain can be negligible. 

• Effective pre-fetching is expensive in terms of bandwidth consumption. The 
optimum is obtained at 100% of over-fetching. Hence different strategies will be 
required depending on whether the link bandwidth is shared or not, and its cost. 



3 Charging for Third-Party Provided Pre-fetching Service in 3G 

The conclusions derived from the above observations seem to justify why, nowadays, 
pre-fetching is not enabled in web proxies. In fact, the main reason for web proxies 
campus and corporative networks is not just the reduction of latency, but the saving 
of a fair amount of bandwidth in the shared internet access link. Pre-fetching would 
reverse the latter positive effect (bandwidth saving) for the sake of the varying no- 
ticeability former effect (latency). Moreover, in the fixed internet access business 
model there is not an easy framework to charge the user -not a terminal- according to 
complex rules. 
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On the other hand, 3G subscribers are identified individually after they turn on their 
terminal and type in their PIN, and there is a working billing system available. The 
following situation may be frequent. The subscriber has paid for a high class ubiqui- 
tous wireless Internet access, and finds that the perceived QoS is under the real access 
capacity, due to bottlenecks and long delays not (only) in the access but somewhere 
in the Internet. In this 2G-3G context, network-based pre-fetching may be a tool to 
cover this demand, improve delivered QoS and charge accordingly. 



3.1 Charging Schemes for Pre-fetching 

Once described the nature of the service, its context and theoretical bounds, several 

charging policies for pre-fetching can be studied: 

• Flat rate. This option is the simplest to deploy as it only requires subscription 
checking. Its drawback is that it relays strongly on the confidence of the sub- 
scriber about the promise performance gain. This is a problem that also suffer 
some commercial offers for differentiated services. In fact, as already described, 
pre-fetching can not guarantee a pre-determined QoS due to the random variables 
involved (mainly, the user will and the conditions of the path to the chosen web 
servers). Therefore, such charging may be rendered unfair by the user, due to the 
uncertainty of the obtained QoS. 

• Charging per GGSN bandwidth devoted to pre-fetching on behalf of a given 
subscriber. This is directly not feasible due to the high number of subscribers 
sharing the Internet access of the 3G network. It is not possible to pre-allocate 
bandwidth for all. An alternative is to pay for sharing the bandwidth proportion- 
ally among all active pre-fetching subscribers. Again the charging procedure is 
very simple, yet the problem comes from the fact that more bandwidth share 
simply means more probability of obtaining an uncertain amount of extra QoS. 

• Charging per pre-fetched information. The rationale for not recommending this 
option is the same as the previous one. In both cases, however, there is a direct 
relation between the extra communication costs caused by each subscriber and 
the charged amount. 

• Charging per saved delay. This is the fairest approach from the subscriber per- 
spective as the charging is directly related, not only to the success of the predic- 
tion module in terms of hits, but to the exact gain achieved by each hit. On the 
other hand, it is the most difficult to implement. The client is charged only for the 
accumulated amount of saved delay. The service provider must have careful con- 
trol of incurred costs, but has the flexibility to allocate to the pre-fetching activity 
just unused resources. 

• Fixed quota plus per-saved-delay charging. Given the operation costs, a fixed 
subscription fee is actually necessary, complementary to charging per saved de- 
lay, in order to establish a point where service provider and client viewpoints 
meet. 
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The computation of saved delay is not straightforward. It could be estimated by this 
formula: 



A ^( 7 ) =Y J (sizeo f (i ) ( ™ 
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Where I is the set of retrieved web objects during a navigation session and function 
sizeofO returns their respective size. Bit rates R and round trip times RTT are defined 
by Fig. 1. is considered constant, and R , must be estimated and recorded by 
the pre-fetcher when the object is downloaded (the simplest implementation will log 
the size and download duration, integrating delay due to connection setup and data 
transfer). The reader must note that the RTT term is particularly important when re- 
trieving small objects or when the bottleneck in the path is the UMTS access. Factor 
k* 1 models the times the RTT is required for the TCP slow-start mechanism to reach 
the capacity of the path. A linear charging scheme proposed by the authors by this 
service would be given by: 

« = + y !><?</) <2) 

/ 

Where K sub . lUm and K t are expressed in monetary unit and monetary unit/ms respec- 
tively. 



3.2 Provisioning and Charging by a Third Party 

The open network interface promoted by 3GPP[6] for UMTS (Universal Mobile 
Telecommunication System, the ETSI European standard of the 3G IMT-2000 system 
launched commercially in Europe in the fourth quarter of 2003) provides a frame- 
work for A4C, along with a significant number of network control functions, that 
makes it possible for third-party service providers the delivery of a wide-range trans- 
action-oriented services upon this telecommunications network. This open network 
interface is named OS A (Open Service Access) [7] and adopts work from Parlay [8]. 
In this section we study how OSA can be used to enable the provision of pre-fetching 
by a third party. 

Figure 2 shows the main elements involved in this scenario. The pre-fetcher keeps 
track of the user navigation and, when a given link access probability threshold is 
exceeded, the associated object is pre-fetched. Based upon the time and object size 
log available at the cache, it is possible to estimate the perceived delay gain, and 
charge the user accordingly through the OSA interface. Note that provisioning of pre- 
fetching by an external entity implies that all the internet traffic is handled by this 
entity. This requires low delay operator- service provider and may require NAT 
(Network Address Translation). 
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Fig. 2. Charging via OSA/Parlay 



The interaction with OSA works as follows. Once the application, in this case the pre- 
fetching proxy, has been successfully authenticated by the Framework SCF (Service 
Capability Functions) of the Parlay Gateway, it is able to access to the authorized 
SCFs. The SCFs required for this service are the Charging SCF and a non-standard 
Terminal Session SCF which will be described in the next section. In particular, all 
the communication between the application and the Parlay gateway employs 
CORBA, although it is modelled as Java classes. For example, in the case of the 
Charging SCF, the relevant API calls are: from the initial ChargingManager ob- 
ject, applications may create several ChargingSession objects that refer to a 
concrete user and merchant. This class implements the interface to perform the most 
relevant operations of credit/debit to request charging the user for some amount of 
money or in some volume unit as in bytes, directly (directCreditUnitReq ( ) ) 
or towards a reservation (debit AmountReq ( ) ) for pre-paid services. 

A quick overview of some UMTS aspects is required to understand a few implemen- 
tation issues of charging via OSA of IP-address-identified services. 

The UMTS architecture is strongly influenced by compatibility with the 2G digital 
telephony system (GSM) and the switched packet data service evolved from it GPRS 
(General Packet Radio Service). Two conceptually new elements have been intro- 
duced: the SGSN (Serving GPRS Support Node) and the GGSN (Gateway GPRS 
Support Node). These devices are in charge of data packet switching. In outline, the 
SGSN deals with mobility across RNCs (Radio Network Controller), following mo- 
bile stations in its service area and with AAA functions; whereas the GGSN is the 
actual gateway to Internet (see UMTS forum [6] documents for further details). 
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Fig. 3. UMTS protocol architecture 



Fig. 3 shows the respective protocol stacks running in each of these elements to trans- 
port IP packets. The purpose of this figure is to reflect that the transport of IP packets 
inside the UMTS network is complex and that the only fixed point in the network 
where persistent caching is possible is just behind the GGSN, as this is the single 
internet access for the UMTS subnetwork. Another important issue when trying to 
charge for pre-fetching is that the "always-on" feature (a fixed global public IP ad- 
dress permanently allocated to the terminal) implies a non-scalable resource con- 
sumption in UMTS nodes. Therefore, IP addresses are provided on demand via the 
dynamic creation of contexts for each data session. This means that the implicit au- 
thentication given by the source IP address is not valid all the time and the binding 
<MSISDN , IP address> must be checked whenever the proxy observes a new data 
flow from an IP address inactive for a period longer than the guard time given by 
UMTS to reassign the address T guaid . Furthermore, since the release of an IP address is 
not conveyed to the proxy, charging requires periodic charging transactions on the 
OSA gateway interfacing the UMTS network. Just a leaky bucket whose period is less 
than the IP address reassignment guard time T guaid is enough to guarantee that the 
identity of the user is still valid and the charge goes to the right bill. This rate for 
CDR generation also determines the throughput required at the OSA gateway to ac- 
complish a target Grade of Service. This can be determined by well known traffic 
engineering formulae such as Erlang-B or Engset, where the maximum number of 
active data sessions that can be charged (the number of servers in that formula) is 
given by m = C serve /C client , where C server is the capacity of the server in transac- 
tions/minute, and C client > 1/T guaid is the CDR generation rate of a single data session. 
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4 A Prototype 

In the context of project [9], a testing scenario for the concepts developed in this 
paper was set up. The targets were to assess the viability and performance of pre- 
fetching in UMTS, to evaluate provisioning by a third-party provider, and to test 
alternatives for charging for this service via OSA/Parlay. The OSA gateway em- 
ployed was AePONA Causeway, set up at Nortel Networks Hispania premises, Voda- 
fone provided its UMTS network and UC3M developed a web pre-fetcher [10]. 
Commercial exploitation of UMTS services had not yet started at the time of testing, 
and, therefore, the system was not tested with real subscribers. The pre-fetcher was 
connected through a VPN to the OSA gateway for authentication and charging pur- 
poses according to the criterium defined in section 3. The maximum delay gain (over 
40%) was easily achieved after a number of code optimizations and training of the 
prediction module. This gain translated into tens of seconds when browsing medium- 
sized pages at distant servers (e.g. Australia from Spain) even though the pre-fetcher 
was suboptimally located outside the operator’s premises. 

Inheriting terminology from the Telephony service, the OSA gateway issued Call 
Detail Records (CDR) with a flexible arbitrary format that enabled the production of a 
detailed charging log containing application-specific information about the charged 
event. As explained before, the experiment required an extension to the OSA API: 
the Terminal Session SCF. This extension permits to find out what user (i.e. which 
MSISDN number) is making a given HTTP request by simply checking the packet 
source IP address. This new functionality is very important in order to make OSA 
fully transparent to the end user. 

A main practical result of the experience is that, regardless of the usage of pre- 
fetching and caching, the utilisation of a network proxy in UMTS is always advisable. 
The main reason is the speed of proxies. Proxies have multi-threading retrieval capa- 
bilities not usually available at UMTS terminal's web browsers. Furthermore, in the 
scenario deployed for the trial, the proxy was located 200ms away from the terminal 
(RTT=400ms) and still the improvement was significantly high in sequential retriev- 
als. 



5 Conclusions 

Several conclusions can be drawn from the previous discussion and from the practical 
experience obtained with the test platform. 

• Today the mechanics of pre-fetching are well known and it is clear that the per- 
formance gain is expensive in terms of bandwidth consumption and added traffic 
burstiness when performed on behalf of a large population of clients. However, 
in the context of internet access through 3G networks, where the business model 
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is quite different from the classical accesses, pre-fetching can be an added-value 
service that can be offered to a certain type of users on a subscription basis. 

• Network-based pre-fetching is viable in UMTS. A rough estimation of the maxi- 
mum performance gain obtained is given by a 100% hit ratio, which, in the typi- 
cal delay-bandwidth UMTS-Internet scenarios, leads to up to 40% of delay re- 
duction (accessing far away or low speed web servers) at a cost of double band- 
width consumption. This means tens of seconds of saving when downloading a 
medium complexity web page. As demonstrated in 1ST project Opium, OSA en- 
ables a business model for pre-fetching in UMTS very difficult to deploy for 
fixed internet access subscribers, and whose low scalability must be controlled 
by subscription. 

• It must be remarked that a network-based prefetcher does not improve the 
throughput in the radio segment, but the perceived end-to-end bandwidth due to 
bottlenecks and delay existing in the fixed part of the network. In other words, 
the only case when this pre-fetching scenario is really cost-effective is when ac- 
cessing far away servers. Otherwise, it is the optimized multithreading capabili- 
ties of the proxy what predominates and still justifies the insertion of a proxy. 

• Thanks to OSA/Parlay it is possible to authenticate, check the subscription, and 
charge subscribers of web content pre-fetching. This worked as expected and all 
tests passed. The range of tarification models applicable is very wide (charging 
for effective pre-fetching, for average virtual bandwidth excess granted, for the 
amount of pre-fetched information, etc), and its granularity is limited by the rate 
of charging transactions at the OSA/Parlay gateway, which is constrained itself 
by a lineal communications/computation overhead. CDRs were generated at a 
maximum rate of 2 CDR/minute in our tests. CDRs were issued only when the 
user selected a link that had been pre-fetched on his/her behalf, and the amount 
of money charged was proportional to the real delay saving. Due to the unpre- 
dictable effectiveness of pre-fetching this seems to be the fairest cost model ap- 
plicable to this scenario. 
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Abstract. In the current Internet, business relationships and agreements 
between peered ISPs do not usually make specific guarantees on reachability, 
availability or network performance. However, in the next generation Internet, 
where a range of Quality of Service (QoS) guarantees are envisaged, new 
techniques are required to propagate QoS-based agreements among the set of 
providers involved in the chain of inter-domain service delivery. In this paper 
we examine how current agreements between ISPs should be enhanced to 
propagate QoS information between domains, and, in the absence of any form 
of central control, how these agreements may be used together to guarantee 
end-to-end QoS levels across all involved domains of control/ownership. 
Armed with this capability, individual ISPs may build concrete relationships 
with their peers where responsibilities may be formally agreed in terms of 
topological scope, timescale, service levels and capacities. We introduce a new 
concept of QoS-proxy peering agreements and propose a cascade of inter- 
domain Service Level Specifications (SLSs) between directly attached peers: 
each ISP meeting the terms of the SLSs agreed with upstream peers by being 
responsible for its own intra-domain service levels while relying on 
downstream peers to fulfill their SLSs. 



1 Introduction 

There is a growing trend towards IP-based services, not only from a technical 
perspective e.g. VoIP (Voice over IP) services, but also from business perspectives, 
e.g. the emergence of Internet-based Application Service Providers. As more and 
more network-performance-sensitive services migrate to IP networks, best-effort 
networks no longer meet their QoS requirements. To this end, services require 
differentiation to provide different QoS levels for different applications over the 
Internet at large. 

The issue of provisioning end-to-end QoS in the Internet is currently being 
investigated by both research and standardisation communities. Requirements from 
ISP perspectives and proposals targeted at building MPLS-based inter-domain tunnels 
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have been recently submitted to the IETF [9], [10], [11] by key players in the field. As 
a pure layer 3 (routed) solution, the European-funded 1ST research project MESCAL 
[7] is investigating solutions targeted at building and maintaining a QoS-aware IP 
layer spanning across multiple domains, and considers architecture, implementation 
and the business models associated with it [4], 

To provide end-to-end QoS across the Internet closer co-operation between 
multiple ISPs is required. The business relationships between ISPs must be 
considered and this is the main focus of this paper, although, we do not consider 
accounting and data collection methods, charging, rating and pricing models in this 
paper. A number of assumptions are made when approaching the co-operation 
problem. Firstly, we assume that there is no global controlling entity over all ISPs; 
secondly, that the problem of intra-domain is already solved and that different 
domains offer a range of local Quality Classes (1-QCs) between their edge routers. 

With these assumptions in place we describe and analyse various modes of 
distributed interaction between ISPs and propose suitable business models to provide 
both loose, qualitative and statistical, quantitative end-to-end QoS guarantees. 
However, to achieve this ISP interaction, it is seen that a strengthening of business 
relationships is required with explicit QoS information being part of these 
agreements. To achieve this we use pSLSs (provider-Service-Level-Specifications) to 
describe QoS attributes to given destinations. These pSLSs are then used to 
concatenate 1-QCs to form extended Quality Classes (e-QCs) to remote destinations. 
We analyse various ways this concatenation can be achieved and the implications on 
scalability. 

Once the pSLSs have been created, the further problem of financial settlement 
must be considered. Drawing on concepts from the current Internet and the global 
PSTN (Public Switched Telephone Network) we propose new peering agreements 
and examine the flow of monies across the new QoS-enabled Internet. 

The paper is organised as follows. Section 2 describes the current arrangements 
between ISPs and describes the business relationships between international PSTN 
operators. Section 3 analyses source-based and cascaded models for QoS-peering. 
Section 4 considers business cases for both loose and statistically guaranteed 
performance levels and the associated financial settlement issues between ISPs. 
Finally, section 5 presents our conclusions. 



2 Business Relationships and Financial Settlements in the 
Current Internet and PSTN Networks 

The global Internet is a collection of independently operated networks whose 
organisation in retrospect has been modelled by a three-tiered hierarchy [6] (Figure 
1). The connectivity and position in the tier model is dependent on the size of the ISP, 
its geographic reach, capacity (in terms of link speeds and routing capability) and the 
available reachable prefixes. While this model is not strictly accurate it serves to 
demonstrate the variety of ISPs and their relationships within the Internet. 

Currently, in the best-effort Internet, there exist two forms of distinct relationships 
between ISPs for traffic exchange, underlined by respective business agreements: 
peer-to-peer and transit (customer-provider). A transit relationship is where one 
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provider provides forwarding (transit) to destinations in its routing table (could be to 
the global Internet) to another ISP for a charge. Usually, this type of business 
relationship is between ISPs belonging to different tiers of the three-tier Internet 
model (lower tier ISP being a customer of the upper tier ISP). Peer-to-peer is the 
business relationship whereby ISPs reciprocally provide only access to each other’s 
customers and it is a non-transitive relationship. It is a kind of 'short-cut' to prevent 
traffic flowing into the upper tiers and allows for the direct flow of traffic between the 
peer-to-peer ISPs. 




Fig. 1 . A post-hoc approximation of a three-tier Internet model with peering/transit agreements. 



Financial Settlements 

The financial settlements between ISPs primarily depend on their business relation- 
ship. In the service-provider settlement, a customer (end-customer or ISP) pays a flat 
rate or a usage-based amount to the provider ISP for reachability to networks, which 
the provider ISP can reach through its peers, customers or through its own provider 
ISPs. The customer will always pay whether the traffic is being sent or received. 

In the negotiated-financial settlement, the traffic volume in each direction is 
monitored and then payment is made on the net flow of traffic. 

In the settlement-free agreement (also known as the Sender-Keeps-All, or SKA 
agreement in the PSTN) neither ISP pays the other for traffic exchange, and they 
usually split the physical layer costs between them. This settlement is a special case of 
the negotiated-financial settlement, because either the traffic is symmetric or because 
the perceived gain to each party is considered worth the agreement. 

PSTN Networks 

An existing network that requires close business relationships to provide end-to-end 
better-than-best-effort communication is the PSTN (Public Switched Telephone Net- 
work). The inter-connection of international PSTN networks has a number of traits 
similar to the inter-domain QoS problem: such as resources must be reserved to pro- 
vide the required level of service but direct interconnection between all networks is 
not possible and therefore trust relationships are required. When crossing international 
boundaries the problem of trust becomes more acute, and the transit network topology 
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is a function of political and financial considerations. These peering agreements are 
overseen by the International Telecommunications Union (ITU) and the World Trade 
Organisation (WTO) and usually result in a financial settlement like the Accounting 
Revenue Division Procedure [1 ]. Here the originating caller at the originating network 
is charged at a customer rate, and the originating network pays the terminating 
network to terminate the call at a previously negotiated rate, the settlement rate. The 
transfer of monies is then usually performed only on the net flow in a negotiated- 
financial settlement style agreement. If, however, the perceived value to each party is 
similar, as it may be when the edge operators adjust their customer rates, the inter- 
operator agreements approach the SKA (sender-keeps-all) agreement. 

These ideas could be drawn into a QoS-enabled Internet business model, albeit 
with the following differences: PSTN has a single QoS traffic unit - the call-minute, 
whereas a QoS-enabled Internet could have many QoS levels. Also, the PSTN, 
especially at the international level, is more flat than hierarchical and has 
organisations in loose control to mediate and arbitrate settlements. Another issue that 
must be considered is which traffic end-point is the initiator, who gains from the data 
flow, and who is charged for it [3], as well as methods of charging for multicast 
traffic. Some of these issues have been considered in the split-edge pricing work in 
[2]. 



3 QoS Peering Approaches 

In the current Internet, ISPs form business relationships with one another and deploy 
links between their networks, either directly or through Internet exchange points 
(IXPs). BGP policies are then deployed to determine which prefixes will be 
advertised to adjacent providers and subsequently best-effort traffic may be routed 
across the Internet. 

We view that in a QoS-enabled Internet additional agreements are needed to de- 
termine the QoS levels, traffic quantities and the destinations to be reached across the 
pre-existing inter-domain links, together with the agreed financial settlement terms, 
penalty clauses, etc. The technical aspects of the QoS agreements between ISPs are 
contained in pSLSs, introduced in section 1. Once the pSLSs are in place, BGP will 
announce the existence of destinations tagged with the agreed QoS levels, and traffic 
conforming to the pSLSs may be forwarded to remote destinations and receive the 
appropriate treatment to meet the agreed performance targets. pSLSs are meant to 
support aggregate traffic, and they are assumed to be in place prior to any agreements 
with end customers (via customer-SLSs, or cSLSs ) or upstream ISPs (via pSLSs) to 
use services based on them. They are negotiated according to the business policies of 
the provider and the outcome of the deployed off-line inter-domain traffic engineering 
algorithms that use forecasts of customer traffic as input. pSLSs are considered as 
semi-permanent, only changing when traffic forecasts for aggregate traffic alter 
significantly, or when business policies are modified. They should not be seen as 
dynamic entities to be deployed for individual customer flows. 

There are many models for the interconnection and service-layer interactions 
between ISPs. Such models are required not only to establish a complete end-to-end 
customer service, but also to provide and maintain the hierarchy of related 
management information flows. Eurescom specified organisational models for the 




372 



P. Georgatsos et al. 



support of inter-operator IP-based services [8]. The models may be grouped into three 
configurations known as the cascade model, the hub model, and the mixture model : a 
combination of the two. The type of inter-domain peering impacts the service 
negotiation procedures, the required signalling protocols, the QoS binding, and path 
selection. In the following we give an overview of the cascade and source-based 
model and analyse their pros and cons. 

Source-Based Approach 

The source-based approach (similar to the hub model from Eurescom) disassociates 
pSLS negotiations from the existing BGP peering arrangements. The originating do- 
main knows the end-to-end topology of the Internet and establishes pSLSs with a set 
of adjacent and distant domains in order to reach a set of destinations, with a 
particular QoS. 

As shown in Figure 2, the originating domain (AS1) has the responsibility for 
managing the overall requested QoS service/connection. To manage customer re- 
quests, the provider (AS1) directly requests peering agreement (pSLS I and pSLS , ) with 
providers AS2 and AS3 and with any other network provider involved in order to 
create an e-QC (from AS1 to AS3). 



pSLS 2 




Transit Domain 



PE / P / BR: Provider Edge / Provider (Core) / Border Router 



^ 31 / 
Egress Domain 



/^c^LS^N 

VOrderingy 
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VQrdering/ 


Customer A 








) 


Customer B 



Fig. 2. Source-Based Approach 



Cascaded Approach 

In the cascaded approach, each ISP established pSLSs only with adjacent ISPs, i.e. 
those ISPs with whom there are existing BGP peering relationships. Figure 3 gives an 
overview of the operations in this approach. The domain AS3 supports an intra- 
domain QoS capability (1-QCj). AS2 supports an intra-domain QoS capability (1-QC,) 
and is a BGP peer of AS3. AS2 and AS3 negotiate a contract (pSLS,) that enables 
customers of AS2 to reach destinations in AS3 with a QoS (e-QC,). This process can 
be repeated recursively to enable AS1 to also reach destinations in AS2 and AS3, but 
at no point do AS 1 and AS3 negotiate directly. 

There are no explicit end-to-end agreements in the cascaded approach, each 
domain may build upon the capabilities of adjacent downstream ASs to form its own 
e-QCs to the required destinations. This recursive approach results in an 
approximation of end-to-end agreements with the merit, as discussed in the following 
section, of being more scalable. 





Provider-Level Service Agreements for Inter-domain QoS Delivery 



373 






r^forward Downstream f 



Egress Domain 
1-QC, 



e-QC, 



e-QC 2 



PE / P / BR: Provider Edge / Provider (Core) / BorderRouter 



/'''cSLS'X 

NQrderingX 




^T^"|asK 


/rn AS3 \ 


/^cSlTN 

xQrderingy 


Customer A 


v i—l ) 






Customer B 



Fig. 3. Cascaded Approach 

Strengths and Limitations of the Source-Based and Cascaded Approaches 

The originating domain in the source-based approach requires an up-to-date topology 
of the Internet including the existence and operational status of every physical link be- 
tween ASs. Whereas, in the cascaded approach, each ISP in the chain only needs to 
know its adjacent neighbours and the status of related interconnection links. 

In both approaches inter-domain routing is pSLS constrained, i.e. traffic will only 
pass through ASs where pSLS agreements are already in place. Since the originating 
domain in the source-based approach has topological knowledge of all domains and 
their interconnections, it is possible to exercise a finer degree of control over the chain 
of pSLSs through to the destinations. In the cascaded approach each AS participating 
in the chain does not have all topology data and the initiator is obliged to use the e- 
QCs previously constructed by the downstream domain. Therefore there is less 
flexibility and control of the whole IP service path, which may result in sub-optimal 
paths, although the QoS constraints of the traffic will still be met. 

In conclusion, a single point of control for the service instances is the compelling 
feature of the source-based approach. However it would be difficult to manage for 
more than a few interconnected ISPs and it is expected that most providers would pre- 
fer the cascaded approach, which reflects the loosely coupled structure of the Internet. 
The cascaded approach makes it possible to build IP QoS services on a global basis 
while only maintaining contractual relationships with adjacent operators. Hence, this 
approach is more scalable than the source-based approach. This also reflects the cur- 
rent behaviour of BGP. 



4 Inter-domain Business Relationships 

Considering a hop-by-hop, cascaded approach for interactions between providers, the 
following business cases are proposed. Figure 4 depicts the business case, which 
directly corresponds to the business model of the Internet as it stands today. The 
business relationships (transit and peer-to-peer) need to be supported by appropriate 
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pSLSs to allow the exchange of QoS traffic. Due to the bi-directional nature of these 
business relationships and their broad topological scope, only services with loose, 
qualitative QoS guarantees may be supported in this case. 




+f*t> pSLS-based customer/provider relationship 
pSLS-based peer/peer relationship 

Fig. 4. Case A: Provisioning services with loose QoS guarantees 

To provision services with statistical guarantees on quantitative bandwidth and 
performance metrics a so-called upstream-QoS-proxy (or simply QoS-proxy) 
relationship needs to exist between ISPs. Figure 5 illustrates this second business 
case. 




-Et> pSls -based upstream-QoS-proxy relationship 

Fig. 5. Case B: Provisioning services with statistical QoS guarantees 

In the QoS-proxy relationship, either of the ISPs may agree with the other ISP to 
provide a transit QoS-based connectivity service to (a subset of) the destinations it can 
reach with this QoS level. The ISP offering the transit QoS service would have built 
its QoS reach capabilities based on similar agreements with its directly attached ISPs 
and so on. 

Two things are worth noting about this business relationship: first, its liberal, 
unidirectional nature, where either ISP can use the other as a QoS proxy; and, second, 
its strong collaborative and transitive nature, which is built in a cascaded fashion. The 
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QoS-proxy business relationship differs from the peer-to-peer and customer-provider 
relationships in the current best-effort Internet in the connotation of the established 
agreements on traffic exchange and subsequently in the directionality of the traffic 
flows. In customer-provider relationships, agreements are established for transporting 
traffic from/to the customer ISPs to/from the provider ISPs and in the peer-to-peer 
relationships, agreements are established for the ISPs to exchange traffic on a mutual 
basis; in QoS-proxy relationships, agreements may be established independently in 
either direction, as each ISP wishes. 

The differences between the two business models, as discussed above, are 
attributed to the diverse types of services each model is set-up to provide. In a best- 
effort or loose-QoS-based connectivity service Internet, geographical coverage is 
clearly the strongest selling point; hence, the customer-provider and peer-to-peer 
relationships. In a statistical-QoS-based service Internet, provided that there is a 
global demand for related services, the ability to provide such QoS levels becomes 
clearly an asset. As such, what is required, is to seek for suitable peers to deliver such 
QoS to desired destinations; hence, the QoS-proxy relationship, which equally applies 
to all ISPs - regardless of their size. 

The business model based on the QoS-proxy relationship resembles the current 
practice in the traditional telecom service world. The QoS-proxy business agreements 
could be seen as corresponding to call-termination agreements established between 
telephony operators/providers. Furthermore, in the telco's world, synergies between 
operators are built mainly on grounds of reliability and competitiveness, much as the 
flat Internet business model implies. This resemblance is not surprising; telephony 
calls and IP services based on statistically guaranteed quantitative QoS metrics, 
provided that there is a global demand for them, are very similar in that they are both 
commodities, which need to be widely offered at a certain quality. 

Financial Settlements 

First, it should be made clear that the financial settlements in a QoS-aware Internet 
are in addition to the settlements made for best-effort connectivity. Broadly speaking, 
the following two principles govern these settlements: the ISP who requests the pSLS 
pays the other; and in the cases where pSLSs exist in both directions or the pSLSs 
have the connotation of mutual agreements, payment reconciliation may take place. 



Table 1. Financial Settlements in the QoS-aware Internet 



Type of business 
relationship 


Type of financial settlement 


customer-provider 


service-provider settlement 


peer-to-peer 


negotiated-financial, or, settlement-free agreement 


QoS-proxy 


service-provider settlement, if only one ISP requests pSLSs, or, 
negotiated-financial or settlement-free agreement, if ISPs request pSLSs from 

each other 



5 Conclusions 

We propose in this paper that agreements for QoS traffic exchange need to be in place 
between ISPs in addition to agreements for best-effort connectivity. As with all 
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agreements between ISPs, the establishment of these pSLSs is instigated by the 
business objectives of the ISP. By augmenting the peer-to-peer and transit (customer- 
provider) relationships of today’s Internet with QoS-based pSLSs we may achieve 
loose inter-domain QoS guarantees. For provisioning harder QoS guarantees to 
specific destinations a new QoS-proxy business relationship is proposed. This type of 
business relationship could be thought as being the QoS Internet counterpart of call- 
termination agreements in the PSTN or VoIP business world. 

The following points are worth mentioning regarding the proposed business 
relationships. First, they are built on top of existing best-effort agreements and they 
are not exclusive: e.g. ISP A may act as transit for ISP B, while using ISP C as a QoS- 
proxy, if this best fits its business objectives. The proposed business models therefore 
offer an incremental, best-effort compatible migration path towards a QoS-capable 
Internet. 

Second, a key aspect of the proposed business relationships is that they are based 
on pSLSs, i.e. service agreements, implying legal obligations. On one hand, this 
offers a tangible lever for ensuring trust between ISPs, which after-all is the bottom- 
line of any inter-domain solution. On the other hand, this might discourage ISPs. 
However, given that QoS-based services are offered to end-customers on the basis of 
Service Level Agreements, requiring the same for ISPs seems reasonable and fair. 

Third, the feasibility of the realisation of the proposed business relationships is 
currently being undertaken by the 1ST MESCAL project [7], The MESCAL solutions 
encompass service management and traffic engineering functions and rely on 
interactions between adjacent ISPs both at the service layer - for pSLS establishment - 
and the IP layer - for determining, configuring and maintaining suitable inter-domain 
QoS routes. The MESCAL validation work covers: 

• information models for describing pSLSs under each of the identified business 
relationships, 

• the logic and protocols for pSLS negotiation, 

• mechanisms for parsing and extracting the necessary traffic engineering 
information from the pSLSs, 

• extensions to the BGP protocol and associated route selection algorithms to allow 
the exchange and processing of QoS routing information based on the established 
pSLS, 

• off-line traffic engineering algorithms for determining the required set of pSLSs 
based on anticipated QoS traffic demand and dimensioning network resources 
accordingly. 

Inevitably, the price to be paid is mainly the increase of the size of the Internet routing 
tables - growing linearly with the number of the distinct levels of QoS offered. 
Further simulations and testbed experimentation work [5] for assessing inter-domain 
routing performance aspects is currently underway. 
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Abstract. Delivering end-to-end QoS is emerging as a key requirement for the 
success of innovative mobile data services. However, overall service quality 
tends to be susceptible to variations in the performance of mobile networks. 
SPs (service providers) will need to know what performance and QoS parame- 
ters are important in a service delivery chain, and how they can be measured 
and aggregated to build an overall picture. This is proving to be challenging 
and placing new requirements on traditional OSSs (Operations Support Sys- 
tems). For example, OSS process flows involved in SLA (Service Level 
Agreement) violation monitoring are markedly more complex. Consequently, 
OSS functions are more interdependent. There is a need to map out and define 
such interdependencies. This paper presents an OSS architecture that enables 
QoS reporting and SLA violation monitoring. Key OSS functions are defined 
and an information model capturing QoS requirements is presented. The results 
of the implementation and validation of the architecture are also presented. 



1 Introduction 

QoS is a measure that indicates the collective effect of service performance that de- 
termines the degree of satisfaction of a user of the service. The measure is derived 
from ability of the network and computing resources to provide different levels of 
services to applications and associated sessions. Customers requiring 3G services for 
business purposes are demanding guarantees on service quality. Providing such guar- 
antees and maintaining agreed QoS levels is made more complex by end-to-end na- 
ture of QoS support in a roaming environment [1], Service quality can vary over 
different network technologies and OSSs need to interoperate in order to ensure that 
the agreed quality of service can be achieved wherever users are located. This re- 
quires performance data from each of the service providers involved in delivering the 
service in order to establish overall quality of service that the customers are using. 

Providing 3G services, e.g., location-based services, or corporate intranet access on 
mobile networks requires a transport connection that transverses many providers and 
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networks [2]. Maintaining QoS in such an environment is going to be complex and 
will require more versatile and comprehensive approach towards QoS management 
[3] [4] . Given this situation, there is a strong need for a clear-cut understanding of 
managed system, management functions, and the management information. The prob- 
lems that OSS developer face is that QoS area is too wide. This paper presents a co- 
herent view of what is essential for the management of end-to-end QoS and SLA 
violations. 

The rest of the paper is organized as follows. Sect. 2 presents a mobile QoS mana- 
gement environment. Sect. 3 defines the OSS functions required. Sect. 4 presents 
information model required for SLA violations management. Sect. 5 provides more 
detailed information on the OSS components implemented to realise the OSS func- 
tions presented in this paper. Sect. 6 concludes the paper. 



2 Mobile QoS Management Environment 

The word environment here captures two important aspects of service quality in mo- 
bile networks: delivery and measurement (see Fig. 1). 

QoS delivery is intertwined with measurement and a broader understanding of this 
matter requires that these two aspects are seen in a combined view. The environment 
shown presents QoS and SLA violation management through horizontal and vertical 
cross-sections. Horizontal cross-section undertakes QoS delivery view, beginning 
from the UE to the server-side, as well as the transport networks in between. Vertical 
cross-section, fully described in Sect. 3, undertakes QoS measurement. 

There are three provider domains namely MAN (Mobile Access Network), EDN 
(External Data Network) and CAN (Content/Application Network) from where re- 
sources supporting a service are drawn. This configuration clearly demarcates the 
boundaries of domains that support end-user services from the UE to the server-side 



CAN: Content/ Application Network, UE: User Equipment, SGSN: Serving GPRS Support Node 

UTRAN: Universal Terrestrial Radio Access Network GGSN: Gateway GPRS Support Node 

MAN: Mobile Access Network; EDN: External Data Network GPRS: General Packet Radio Service iMl Performance Meter 



ustomer QoS/SLA Management Service Product Quality Objectives 



/Service Quality Management 
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Resource Performance Management 

Performance Data Performance Data Performance Data 



Performance Data 




Local Bearer GPRS Bearer Service External (or IP) 
Service (Core Packet Backbone) Bearer Service 
"DELIVERY- 



CAN (or Intranet) 
Bearer Service 



Fig. 1 . Mobile QoS management environment 
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[5]. Measuring end-to-end QoS involves measuring QoS of the three domains or 
segments shown in Fig. 1. MAN combines Local and GPRS Bearer Services, which 
implement TCP/UDP/1P and GTP (GPRS Tunnelling Protocol) [6] [7], The second 
segment is the EDN, which corresponds to External (or IP) Bearer Service. This is the 
IP backbone that carries Web applications, FTP, and email traffic and supports RSVP, 
DiffServ [8]. Lastly, CAN is an intranet belonging to a SP or third-party SP. 

The aggregated values of a service product KQI (Key Quality Indicator) [9] are de- 
rived from values of lower level service KQIs, which focus on performance of indi- 
vidual services. Services depend on the reliability of network and service resources 
and KPI (Key Performance Indicator) values [9] are calculated from the performance 
data provided by the resources. An example of service KQI can be 90% availability 
of end-to-end network connection. To calculate this value, there can be at least two 
KPIs, namely resource up-time and resource down-time, whose values are required. 



3 OSS Functions and Process Flow 

The OSS functions defined in this section concern with the OSS support required by 
the SPs, mobile network operator, the customer, as well as between SP and third- 
party SP. To define OSS functions, layers of the eTOM [20] model are used in the 
Assurance business process. Requirements of these stakeholders on OSS are formu- 
lated from the Assurance business processes and incorporated in the definitions. 



3.1 OSS Function Definitions 

Customer QoS/SLA Management. This function ensures that the performance of a 
product delivered to customers meets the SLA parameters, or product KQI objectives, 
as agreed in the contract concluded between the SP and the customer. This function 
has four specific operations. SLA specification specifies SLA parameters for a new 
product, including soft and hard thresholds. SLA monitoring collects service KPI 
values achieved, aggregates them into product KQI values achieved. SLA analysis 
analyses product KQI values achieved against the SLA parameters agreed. It triggers 
further action if the service KQI values achieved do not meet the agreed SLA values. 
SLA reporting produces product performance summaries and trend reports of SLA 
parameters values achieved compared with the agreed SLA values. 

Service Quality Management. This function ensures that the performance of a ser- 
vice delivered to customers meets the service KQI objectives. It uses mathematical 
algorithms that transform KPI values into KQI values. This function has four specific 
operations. Sen’ice KQI specification specifies service KQIs and their objectives for a 
new service, including soft and hard thresholds. Service KQI monitoring collects 
resource KPI values achieved and aggregates them into service KQI values achieved 
using algorithms. Service KQI analysis analyses service KQI values achieved against 
the service KQI objectives and issues warnings and alarms if the values achieved do 
not meet the objectives. Sen’ice KQI reporting produces service performance summa- 
ries and trend reports of KQI values achieved compared with the KQI objectives. 
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Resource Performance Management. This function ensures that the performance of 
the resources involved in delivering services meets the resource KPI objectives. This 
function has four specific operations. Resource KPI specification specifies resource 
KPIs and their objectives. Resource KPI monitoring collects performance data and 
aggregates it into KPI values achieved. Resource KPI analysis analyses KPI values 
achieved against the KPI objectives. It issues warnings and alarms if the resource KPI 
values achieved do not meet the KPI objectives. Resource KPI reporting produces 
performance summaries and trend reports of resource KPI values achieved. 



3.2 Process Flow 

The process flow shown in Fig. 2 is based on the assumption that customers subscribe 
to SP for the services they use. 

The SP also has appropriate supplier/partner (S/P) relationship with the one or 
more third-party SPs. The process flows shows how the SP aggregates its own data 
with the performance and usage data it receive from third-party SPs. The process flow 
steps shown in Fig. 2 are as follows: 




Fig. 2. Service Quality Monitoring and SLA Violation Management Process Flow 

During normal operation, performance data that is used for monitoring of service 
levels as well as for longer-term capacity prediction is collected on an ongoing basis 
from the service-providing infrastructure by Resource Data Collection & Processing 
(step 1 in Fig. 2). During normal operation, performance data from external service 
components of third-party SPs is sent on an ongoing basis to S/P Performance Man- 
agement for general monitoring of service levels (step 2). Resource Data Collection 
& Processing sends performance data to Resource Performance Management for 
further analysis (step 3). Resource Performance Management sends performance 








382 



B. Bhushan et al. 



reports (step 4) and S/P Performance Management sends external component per- 
formance reports (step 5) to Sendee Quality Management for QoS calculations and to 
maintain statistical data on the supplied service instances. Service Quality Manage- 
ment analyses the performance reports received and sends overall service quality 
reports to Customer QoS/SLA Management so that it can monitor and report aggre- 
gate technology and service performance (step 6). Customer QoS/SLA Management 
checks the quality reports it receives against the individual Customer SLA and estab- 
lishes that no SLA violation has occurred (step 7). 



4 Information Model 

The model shown in Fig. 3 deals with SLA, QoS and the surrounding classes. It is 
based on input from the TM Forum SID [10] [11] [12] models but simplified and 
specialised to meet the requirements of the work presented in this paper. The model 
presented here introduces a new relationship between KeyQualitylndicatorSLSParam 
and KeyPerformancelndicatorSLSParam, which indicates that KQI values used to 
determine compliance with service level objectives are derived from KPI values. 
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Fig. 3. QoS Information Model 



4.1 Customer Contract View 

ProductSpecification. This is a collection of services offered to customers. Enter- 
prises create product offerings from product specifications that have additional mar- 
ket-led details applied over a particular period of time. The relationship between 
ProductSpecification and SLA signifies what services are agreed with the customer. 
ServiceSpeciflcation. This is an abstract base class for defining the ServiceSpecifica- 
tion hierarchy. It defines the invariant characteristics of a Service and may also exist 
within groupings, such as within a Product. The relationship between ProductSpecifi- 
cation and ServiceSpecification signifies what services are offered to the customer. 
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ServiceLevelAgreement. A SLA is a formal negotiated agreement between two 
parties regarding provided Service Level [13]. It is designed to create a common 
understanding about services, priorities, responsibilities, etc. It is usually written 
down explicitly and has legal consequences. Its main aim is to achieve and maintain a 
specified QoS in accordance with ITU-T and ITU-R Recommendations [13]. These 
procedures and targets are related to specific service availability or performance. 
ServiceLevelSpecification. This is a group of service level objectives, such as 
thresholds, metrics, and tolerances, along with consequences that result from not 
meeting the objectives. Service level specifications are also characterised by exclu- 
sions associated with the specification. 

ServiceLevelSpecObjectives. These objectives can be seen as a quality goal for a 
ServiceLevelSpecification defined in terms of metrics and thresholds associated with 
the parameters. The confonnanceTarget attribute determines whether ServiceLevel- 
SpecObjectives are met. Attribute conformanceComparator specifies whether Ser- 
viceLevelSpecObjectives are violated above or below the conformanceTarget. 
ServiceLevelSpecConsequences. It is an action that takes place in the event that a 
ServiceLevelObjective is not met. Violations to SLA usually result in some conse- 
quence for the SP. 

4.2 Service Quality View 

KeyQualitylndicatorSLSParam. ServiceLevelSpecification parameters can be one 
of two types: KQIs and KPIs. A KQI provides a measurement of a specific aspect of 
the performance of a Product or a Service (i.e. ServiceSpecification). A KQI draws its 
data from a number of sources, including KPIs. This subsection deals with KQIs and 
the next subsection with KPIs. KeyQualitylndicatorSLSParam is the superclass of 
EndToEnd Availability, Accuracy and ServiceTransactionSpeed [14]. 

EndToEnd Availability KQI is the likelihood with which the relevant product or 
service can be accessed at customer’s request. To determine the value of end-to-end 
availability, aggregated values MAN, EDN and CAN availability are measured and 
then combined in a single KQI value. The main attributes of this class are manAvail- 
ability, ednAvailability Percentage, canAvailability [5]. KQITransformationAlgorithm 
is a procedure used to calculate the KQI value. Accuracy defines the fidelity and 
completeness in carrying out the communication function. This KQI informs about 
the degree to which networks and services are dependable and error free. The attrib- 
utes of this class are networkingAccuracy, sen’iceTransactionAccuracy, and con- 
tentlnfoReceptionAccuracy [5]. ServiceTransactionSpeed defines the promptness with 
which network and service complete end user transactions. The main attributes of this 
class are serviceSessionSetupRate, serviceResponseTime, and contentlnfoDeliv- 
eryRate [5]. 

4.3 Network Performance View 

KeyPerformancelndicatorSLSParam. KPIs provide a measurement of a specific 
aspect of the performance of a service resource or group of service resources of the 
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Fig. 4. OSS Component Architecture 



same type. A KPI is restricted to a specific resource type. This is the superclass of 
three subclasses: AvailabilityKPl, AccuracyKPI and ServiceTransactionSpeedKPI. 

AvailabilityKPI represents the KPIs whose values are required for the calculation 
of the EndToEnd Availability KQI value. The attributes of this subclass are pdpCon- 
textSet-upTime [7], httpRoundTripDelay [15], up-time , and down-time [5]. Accura- 
cyKPI represents the KPIs whose values are required for the calculation of the Accu- 
racy KQI value. The main attributes of this subclass are sduLossRcitio [6], 
httpRoundTripDelay , BER (Bit Error Rate) [6] and jitter. ServiceTransactionSpeed- 
KPI represents the KPIs whose values are required for the ServiceTransactionSpeed 
KQI value. The main attributes are congestion [5], sen’iceResourceCapacity [16], 
serviceRequestLoad, pdpContextSet-upTime, and contentlnfoDeliveryRate [5]. 

SID models are not very clear about combining values of two or more KQIs to 
form a single combined KQI value, as proposed in WSMH [9]. To remedy this, the 
information model presented derives subclasses from KeyQualitylndicatorSLSParam, 
which represent the combined KQI whereas subclasses’ attributes represent the KQIs. 



5 OSS Components and Validation Scenario 

OSS functions have been validated through development of the components that 
provide those functions and then by testing them on 3G network and with a location- 
based service in a validation scenario. 

5.1 Components Architecture 

The component architecture has undertaken process flows described in Sect. 3. The 
process flows and the OSS functions are used as a blueprint to assemble a set of OSS 
components in an integrated management system (Fig. 4). 

SLA Manager. This component maintains customer SLAs and for ensuring that the 
delivered QoS meets the QoS specified in customer contracts or other product speci- 
fications. It ensures that the provisions in the contract are being met and issues warn- 
ings and alarms if this is not the case. This means that it must be able to monitor. 
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analyse and report on the actual QoS achieved compared with the QoS specified in 
the SLA. 

Monitoring Analysis Component (MAC). MAC collects and aggregates values of 
KPIs from network and service resources and transforms them into KQI values using 
algorithms, for example, mathematical equations. The MAC uses the value of one or 
more KQIs and service quality objectives as a basis and issues reports containing 
warnings or alarms to the SLA Manager. These reports inform the SLA Manager that 
the service quality is about to deteriorate or has deteriorated below a threshold value. 
Network Performance Evaluator (NPE). NPE measures resource performance in 
KPI value. The NPE is a rule-based component and operates in conjunction with 
performance meters, which collect data from network and service resources. Meters 
can be programmed by means of filter specifications, which are simple instruction 
rules. 

Report Log. The Report Log component receives service quality reports from the 
MAC and stores them in a log. It also allows the service provider to view and browse 
through the report log via a web browser. 

Supporting Components. There are two more components namely FMA (Federated 
Mediation Adaptor) and CSC (Customer Service Contract) Manager that also take 
part in SLA management. The CSC Manager provides methods for the management 
of SLA data. The FMA component collects SLA violations reports and forwards them 
to a rating and billing facility. 



5.2 Validation Scenario 

The main objective of validation scenario is to demonstrate that end-to-end service 
quality of 3G services can be measured and reported to monitor SLA violation. The 
SP, shown in Fig 4., integrates a location-based service and virtual home environment 
service, both of which are provided by third-party SPs. The end-use client is the sin- 
gle point of access and authentication for the customer. SP obtains performance data 
from each of the third-party SPs and Mobile Network Provider, and aggregates and 
analyses the data to establish overall service quality and also to produce overall QoS 
reports. If a SLA violation occurs, violation reports are sent to the Billing process. An 
algorithm is used to measure service availability as a percentage value, which is com- 
pared against a service quality objective. In the scenario, objective for overall service 
availability is set to 90%, and an alarm is raised if service availability value reaches 
below 90%. A software system showing this validation scenario has been successfully 
demonstrated in the trials of an EU 1ST project called AlbatrOSS [17] and fuller de- 
tails on validation can be found in [18]. The concepts presented in this paper are also 
being modified and further developed in another 1ST project called Daidalos [19]. 



5.3 Component Operations 

This section uses UML Message Sequence Diagrams to illustrate in Fig. 5 how OSS 
components interact in order to achieve the goal of validation scenario. 
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Fig. 5. Service Quality Meaurement and SLA Violation Reporting 



The sequence begins when the end-user logs on and begins to use the service. This 
is the SLA initialisation phase in which the SLA Manager obtains the SLS details for 
the service being used by the end user and creates an instance of the MAC. MAC 
continually interacts with the NPE and obtains the values of KPIs from web-based 
servers running over UMTS network, measures the value of the KQIs, and compares 
it against the service quality objective set in the SLS. As the KQI values go below the 
objective, the MAC informs the SLA Manager that the service quality reached below 
the objective, which may have a consequence and an action need to be taken. The 
action may state that the end user is entitled to receive a discount. In this case, a re- 
port stating the action to be taken is sent to the FMA, which forwards it to a Rating 
facility. 

6 Summary and Conclusions 

The main theme of this paper is measurement and aggregation of service quality in a 
mobile service provision chain, and analysis of service quality against certain objec- 
tives. In this scenario, interdependencies among OSS functions are more pronounced 
compared to traditional OSSs. This paper tries to map out such interdependencies. 

This paper presents an OSS that is based on requirements of the personalised mo- 
bile data services. The focus is mainly on the specific functionality in an OSS for 3G 
and beyond mobile networks, although traditional OSS functions that may need to be 
extended for mobile data environment have also been considered. These functions 
define the scope of the system; their specification lays the foundation for the compo- 
nent and interface design in management system architecture. The requirements of the 
specified functions for the QoS information have also been considered and compre- 
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hensive information model has also been produced. It has been designed as a light 
and flexible structure able to support the key OSS functions defined industrial consor- 
tium such as 3GPP and TMForum. This work provides practical insight into the way 
the TMForum SID models can be applied to develop QoS management system. The 
paper also presents an OSS component architecture that implements the OSS func- 
tions defined in the paper. The validation scenario has provided the opportunity to 
test out the OSS functions and information model. 
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